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Preface 


Number theory and algebra play an increasingly significant role in computing 
and communications, as evidenced by the striking applications of these subjects 
to such fields as cryptography and coding theory. My goal in writing this book 
was to provide an introduction to number theory and algebra, with an emphasis 
on algorithms and applications, that would be accessible to a broad audience. In 
particular, I wanted to write a book that would be appropriate for typical students in 
computer science or mathematics who have some amount of general mathematical 
experience, but without presuming too much specific mathematical knowledge. 

Prerequisites. The mathematical prerequisites are minimal: no particular math- 
ematical concepts beyond what is taught in a typical undergraduate calculus 
sequence are assumed. 

The computer science prerequisites are also quite minimal: it is assumed that the 
reader is proficient in programming, and has had some exposure to the analysis of 
algorithms, essentially at the level of an undergraduate course on algorithms and 
data structures. 

Even though it is mathematically quite self contained, the text does presup- 
pose that the reader is comfortable with mathematical formalism and also has 
some experience in reading and writing mathematical proofs. Readers may have 
gained such experience in computer science courses such as algorithms, automata 
or complexity theory, or some type of “discrete mathematics for computer science 
students” course. They also may have gained such experience in undergraduate 
mathematics courses, such as abstract or linear algebra. The material in these math- 
ematics courses may overlap with some of the material presented here; however, 
even if the reader already has had some exposure to this material, it nevertheless 
may be convenient to have all of the relevant topics easily accessible in one place; 
moreover, the emphasis and perspective here will no doubt be different from that 
in a traditional mathematical presentation of these subjects. 
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Structure of the text. All of the mathematics required beyond basic calculus 
is developed “from scratch.” Moreover, the book generally alternates between 
“theory” and “applications”: one or two chapters on a particular set of purely 
mathematical concepts are followed by one or two chapters on algorithms and 
applications; the mathematics provides the theoretical underpinnings for the appli- 
cations, while the applications both motivate and illustrate the mathematics. Of 
course, this dichotomy between theory and applications is not perfectly main- 
tained: the chapters that focus mainly on applications include the development 
of some of the mathematics that is specific to a particular - application, and very 
occasionally, some of the chapters that focus mainly on mathematics include a 
discussion of related algorithmic ideas as well. 

In developing the mathematics needed to discuss certain applications, I have 
tried to strike a reasonable balance between, on the one hand, presenting the abso- 
lute minimum required to understand and rigorously analyze the applications, and 
on the other hand, presenting a full-blown development of the relevant mathemat- 
ics. In striking this balance, I wanted to be fairly economical and concise, while at 
the same time, I wanted to develop enough of the theory so as to present a fairly 
well-rounded account, giving the reader more of a feeling for the mathematical 
“big picture.” 

The mathematical material covered includes the basics of number theory 
(including unique factorization, congruences, the distribution of primes, and 
quadratic reciprocity) and of abstract algebra (including groups, rings, fields, and 
vector spaces). It also includes an introduction to discrete probability theory — this 
material is needed to properly treat the topics of probabilistic algorithms and cryp- 
tographic applications. The treatment of all these topics is more or less standard, 
except that the text only deals with commutative structures (i.e., abelian groups and 
commutative rings with unity) — this is all that is really needed for the purposes of 
this text, and the theory of these structures is much simpler and more transparent 
than that of more general, non-commutative structures. 

The choice of topics covered in this book was motivated primarily by their 
applicability to computing and communications, especially to the specific areas 
of cryptography and coding theory. Thus, the book may be useful for reference 
or self-study by readers who want to learn about cryptography, or it could also be 
used as a textbook in a graduate or upper-division undergraduate course on (com- 
putational) number theory and algebra, perhaps geared towards computer science 
students. 

Since this is an introduction, and not an encyclopedic reference for specialists, 
some topics simply could not be covered. One such, whose exclusion will undoubt- 
edly be lamented by some, is the theory of lattices, along with algorithms for and 
applications of lattice basis reduction. Another omission is fast algorithms for 
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integer and polynomial arithmetic — although some of the basic ideas of this topic 
arc developed in the exercises, the main body of the text deals only with classical, 
quadratic-time algorithms for integer and polynomial arithmetic. However, there 
are more advanced texts that cover these topics perfectly well, and they should be 
readily accessible to students who have mastered the material in this book. 

Note that while continued fractions are not discussed, the closely related prob- 
lem of “rational reconstruction” is covered, along with a number of interesting 
applications (which could also be solved using continued fractions). 

Guidelines for using the text. 

• There are a few sections that are marked with a “(*),” indicating that the 
material covered in that section is a bit technical, and is not needed else- 
where. 

• There are many examples in the text, which form an integral paid of the 
book, and should not be skipped. 

• There are a number of exercises in the text that serve to reinforce, as well 
as to develop important applications and generalizations of, the material 
presented in the text. 

• Some exercises are underlined . These develop important (but usually sim- 
ple) facts, and should be viewed as an integral paid of the book. It is highly 
recommended that the reader work these exercises, or at the very least, read 
and understand their statements. 

• In solving exercises, the reader is free to use any previously stated results 
in the text, including those in previous exercises. However, except where 
otherwise noted, any result in a section marked with a “(*),” or in §5.5, 
need not and should not be used outside the section in which it appeal's. 

• There is a very brief “Preli mi naries” chapter, which fixes a bit of notation 
and recalls a few standard facts. This should be skimmed over by the reader. 

• There is an appendix that contains a few useful facts; where such a fact is 
used in the text, there is a reference such as “see § An,” which refers to the 
item labeled “An” in the appendix. 

The second edition. In preparing this second edition, in addition to correcting 
errors in the first edition, I have also made a number of other modifications (hope- 
fully without introducing too many new errors). Many passages have been rewrit- 
ten to improve the clarity of exposition, and many new exercises and examples 
have been added. Especially in the earlier chapters, the presentation is a bit more 
leisurely. Some material has been reorganized. Most notably, the chapter on prob- 
ability now follows the chapters on groups and rings — this allows a number of 
examples and concepts in the probability chapter that depend on algebra to be 
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more fully developed. Also, a number of topics have been moved forward in the 
text, so as to enliven the material with exciting applications as soon as possible; 
for example, the RSA cryptosystem is now described right after Euclid’s algorithm 
is presented, and some basic results concerning quadratic residues are introduced 
right away, in the chapter on congruences. Finally, there arc numerous changes 
in notation and terminology; for example, the notion of a family of objects is 
now used consistently throughout the book (e.g., a pairwise independent family 
of random variables, a linearly independent family of vectors, a pairwise relatively 
prime family of integers, etc.). 

Feedback. I welcome comments on the book (suggestions for improvement, error 
reports, etc.) from readers. Please send your comments to 

victorOshoup . net. 

There is also a web site where further material and information relating to the book 
(including a list of errata and the latest electronic version of the book) may be 
found: 

www . shoup . net /ntb. 
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Fazio, Rosario Gennaro, Mark Giesbrecht, Stuart Haber, Kristiyan Haralambiev, 
Gene Itkis, Charanjit Jutla, Jonathan Katz, Eike Kiltz, Alfred Menezes, Ilya 
Mironov, Phong Nguyen, Antonio Nicolosi, Roberto Oliveira, Eeonid Reyzin, 
Eouis Salvail, Berry Schoenmakers, Hovav Shacham, Yair Sovran, Panos Toulis, 
and Daniel Wichs. A very special thanks goes to George Stephanides, who trans- 
lated the first edition of the book into Greek and reviewed the entire book in prepa- 
ration for the second edition. I am also grateful to the National Science Foundation 
for their support provided under grants CCR-03 10297 and CNS-07 16690. Finally, 
thanks to David Tranah for all his help and advice, and to David and his colleagues 
at Cambridge University Press for their progressive attitudes regarding intellectual 
property and open access. 


New York, June 2008 


Victor Shoup 



Preliminaries 


We establish here some terminology, notation, and simple facts that will be used 
throughout the text. 


Logarithms and exponentials 

We write log x for the natural logarithm of x, and log,, x for the logarithm of x to 
the base b. 

We write e x for the usual exponential function, where e x 2.71828 is the base of 
the natural logarithm. We may also write exp[x] instead of e x . 


Sets and families 

We use standard set-theoretic notation: 0 denotes the empty set; x e A means that 
x is an element, or member, of the set A; for two sets A, B, A C B means that 
A is a subset of B (with A possibly equal to B), and A C B means that A is a 
proper subset of B (i.e., A C B but A f B). Further, A U B denotes the union of 
A and B, An B the intersection of A and B , and A \ B the set of all elements of 
A that are not in B. If A is a set with a finite number of elements, then we write 
|A| for its size, or cardinality. We use standard notation for describing sets; for 
example, if we define the set S := {-2, -1,0, 1,2}, then {x 2 : x £ 5} = {0, 1,4} 
and {x e S : x is even} = {-2,0,2}. 

We write A] x • • • x ,S n for the Cartesian product of sets A | , . . . , S n , which is 
the set of all ^-tuples {a\, . . . , a n ), where a, e A, for i = 1 We write A x " for 
the Cartesian product of n copies of a set A, and for x £ A, we write x x " for the 
element of A x " consisting of n copies of x. (This notation is a bit non-standard, 
but we reserve the more standard notation A" for other purposes, so as to avoid 
ambiguity.) 
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A family is a collection of objects, indexed by some set I, called an index set. 
If for each i e I we have an associated object x,-, the family of all such objects 
is denoted by {x,}, e /. Unlike a set, a family may contain duplicates; that is, we 
may have x, = xj for some pair of indices i,j with i f j. Note that while {x ( -},- 6 / 
denotes a family, {x, : i e /} denotes the set whose members are the (distinct) 
X/’s. If the index set I has some natural order, then we may view the family {x,- },■<=/ 
as being ordered in the same way; as a special case, a family indexed by a set of 
integers of the form {m, . . . , n} or { m, m+ 1 , . . . } is a sequence, which we may write 
as {x,-}”_ m or {x,-}“ m . On occasion, if the choice of index set is not important, we 
may simply define a family by listing or describing its members, without explicitly 
describing an index set; for example, the phrase “the family of objects a, b, c” may 
be interpreted as “the family {x,-}? =1 , where xi := a, X 2 := b , and X 3 := c.” 

Unions and intersections may be generalized to arbitrary families of sets. For a 
family { 5j- } , 6 / of sets, the union is 

[J Si := {x : x e S, for some i e /}, 

iel 

and for I ^ 0 , the intersection is 

P] Si := { x : x e A, for all i e f ). 

iel 

Note that if 1 = 0, the union is by definition 0, but the intersection is, in general, 
not well defined. However, in certain applications, one might define it by a spe- 
cial convention; for example, if all sets under consideration arc subsets of some 
“ambient space,” £2, then the empty intersection is usually taken to be Q. 

Two sets A and B arc called disjoint if A n B = 0. A family { .S’, } , e / of sets is 
called pairwise disjoint if S, n Sj = 0 for all i, j € I with /' ^ j. A pairwise disjoint 
family of non-empty sets whose union is S is called a partition of S\ equivalently, 
{Si }/ 6 / is a partition of a set S if each Sj is a non-empty subset of S, and each 
element of S belongs to exactly one A,-. 


Numbers 

We use standard notation for various sets of numbers: 

Z := the set of integers = {. . . , -2, -1,0, 1,2, . . .}, 

Q := the set of rational numbers = {a/b : a, b e Z, bjt 0}, 
M := the set of real numbers, 

C := the set of complex numbers. 
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We sometimes use the symbols oo and — oo in simple arithmetic expressions 
involving real numbers. The interpretation given to such expressions should be 
obvious: for example, for every x e 1, we have -oo <x<oo, x + oo = oo, 
x - oo = -oo, oo + oo = oo, and (-oo) + (-oo) = -oo. Expressions such as 
x • (±oo ) also make sense, provided x ^ 0. However, the expressions oo - oo and 
0 • oo have no sensible interpretation. 

We use standard notation for specifying intervals of real numbers: for a, h £ R 
with a < b, 

[a, b\ := )xel:a<x< b }, (a,b) := {x e M : a < x < b }, 

[a, b) := {x e M : a < x < b], (a,(>]:= {xel:a<x<i). 

As usual, this notation is extended to allow a = -oo for the intervals (a, b] and 
( a , b), and b = oo for the intervals [a, b ) and ( a , b). 


Functions 

We write / : A -» B to indicate that / is a function (also called a map) from 
a set A to a set B. If A' C A, then /(A') := {f(a) : a e A'} is the image of 
A' under /, and / (A) is simply referred to as the image of /; if B' C B , then 
f~ l (B') := {a e A : f(a)eB'} is the pre-image of B' under /. 

A function / : A -» B is called one-to-one or injective if f{a) = f ( b ) implies 
a = b. The function / is called onto or surjective if / (A) = B. The function / 
is called bijective if it is both injective and surjective; in this case, / is called a 
bijection, or a one-to-one correspondence. If / is bijective, then we may define 
the inverse function f~ l : B -» A, where for b e B, f~ l {b ) is defined to be 
the unique a e A such that /(a) = b ; in this case, f~ l is also a bijection, and 

c r 1 )- 1 = /. 

If A' C A, then the inclusion map from A' to A is the function i : A' -> A given 
by /(a) := a for a e A'; when A' = A, this is called the identity map on A. If 
A 1 C A, f : A' —>■ B, f : A -> B, and f\a) = /(a) for all a e A', then we say 
that /' is the restriction of / to A', and that / is an extension of /' to A. 

If / : A -> B and g : B -> C are functions, their composition is the function 
go / : A -> C given by (g o /)(a) := g(/(a)) for a e A. If / : A -> B is a 
bijection, then /~* o / is the identity map on A, and / o / _1 is the identity map on 
£. Conversely, if / : A -> B and g : B -> A are functions such that g o / is the 
identity map on A and / o g is the identity map on 5, then / and g arc bijections, 
each being the inverse of the other. If / : A — ► B and g : B -» C are bijections, 
then so is g o /, and (g o / j " 1 =f 1 og- 1 . 

Function composition is associative; that is, for all functions f : A B, 
g : B -> C, and /7 : C -» H, we have (h o g) o / = /z o (g o /). Thus, we 
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can simply write h o g o / without any ambiguity. More generally, if we have 
functions /, : A, -> A i+ \ for i = 1, . . . , n, where n > 2, then we may write their 
composition as /„ o • • • o /, without any ambiguity. If each /, is a bijection, then so 
is /„ o • • • o /i , its inverse being /“* o • • • o . As a special case of this, if A; = A 
and ft = / for i = 1, . . . , n, then we may write /„ o • • • o /, as /". It is understood 
that /* = /, and that /° is the identity map on A. If / is a bijection, then so is f n 
for every non-negative integer n, the inverse function of /" being (/ _1 )", which 
one may simply write as f~ n . 

If / : I -> S is a function, then we may view / as the family {x, }, e /, where 
x, := /(/). Conversely, a family {x;}/ 6 /, where all of the x,’s belong to some set 
.V, may be viewed as the function / : I S given by /(/) := x, for i e I. Really, 
functions and families are the same thing, the difference being just one of notation 
and emphasis. 


Binary operations 

A binary operation * on a set A is a function from S x S to S, where the value 
of the function at (a, b) e S x S is denoted a * b. 

A binary operation ★ on S is called associative if for all a, b, c e S, we have 
(a * b) ★ c = a * (b * c). In this case, we can simply write a * b * c without 
any ambiguity. More generally, for a\,...,a n 6 S, where n > 2, we can write 
a\ * ■ ■ ■ * a„ without any ambiguity. 

A binary operation * on S is called commutative if for all a,b e S, we have 
a* b = bra. If the binary operation * is both associative and commutative, then not 
only is the expression a\ * • • • * a n unambiguous, but its value remains unchanged 
even if we re-order the a/’s. 

If * is a binary operation on S, and S' C S, then S' is called closed under * if 
a* be S’ for all a, be S’. 
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Basic properties of the integers 


This chapter discusses some of the basic properties of the integers, including the 
notions of divisibility and primality, unique factorization into primes, greatest com- 
mon divisors, and least common multiples. 

1.1 Divisibility and primality 

A central concept in number theory is divisibility. 

Consider the integers Z = {. . . , -2, -1, 0, 1, 2, . . .}. For a,be Z, we say that a 
divides b if az = b for some z € Z. If a divides b, we write a \ b, and we may say 
that a is a divisor of b, or that b is a multiple of a, or that b is divisible by a. If a 
does not divide b , then we write a \ b. 

We first state some simple facts about divisibility: 

Theorem 1.1. For all a,b,ce Z, we have 

(i) a | a, 1 | a, and a | 0; 

(ii) 0 | a if and only if a = 0; 

(iii) a | b if and only if —a \ b if and only if a \ —b; 

(iv) a | b and a \ c implies a \ (b + c) ; 

(v) a | b and b \ c implies a \ c. 

Proof. These properties can be easily derived from the definition of divisibility, 
using elementary algebraic properties of the integers. For example, a \ a because 
we can write a ■ 1 = a\ 1 | a because we can write 1 • a = a\ a \ 0 because we can 
write a ■ 0 = 0. We leave it as an easy exercise for the reader to verify the remaining 
properties. □ 

We make a simple observation: if a \ b and b f 0. then 1 < |a| < \b\. Indeed, 
if az = b f 0 for some integer z, then a f 0 and z 0; it follows that |a| > 1, 
|z| > 1, and so |n| < |n||^| = \b\. 
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Theorem 1.2. For all a, b e Z, we have a \ b and b \ a if and only if a = ±b. In 
particular, for every a e Z, we have a | I if and only if a = ±1. 

Proof. Clearly, if a = ±b, then a \ b and b \ a. So let us assume that a \ b and 
b | a, and prove that a = ±b. If either of a or b are zero, then the other must be zero 
as well. So assume that neither is zero. By the above observation, a \ b implies 
|n| < \b\, and b \ a implies \b\ < |a|; thus, |n| = |6|, and so a = ±b. That proves the 
first statement. The second statement follows from the first by setting b := 1, and 
noting that 1 | a. □ 

The product of any two non-zero integers is again non-zero. This implies the 
usual cancellation law: if a, b , and c are integers such that a f 0 and ab = ac. then 
we must have b = c; indeed, ab = ac implies a(b — c) = 0, and so a ^ 0 implies 
b — c = 0, and hence b = c. 

Primes and composites. Let n be a positive integer. Trivially, 1 and n divide n. 
If n > 1 and no other positive integers besides 1 and n divide n. then we say n is 
prime. If n > 1 but n is not prime, then we say that n is composite. The number 1 
is not considered to be either prime or composite. Evidently, n is composite if and 
only if n = ab for some integers a, b with 1 < a < n and 1 < b < n. The first few 
primes are 

2,3,5,7,11,13,17,.... 

While it is possible to extend the definition of prime and composite to negative 
integers, we shall not do so in this text: whenever we speak of a prime or composite 
number, we mean a positive integer. 

A basic fact is that every non- zero integer can be expressed as a signed product 
of primes in an essentially unique way. More precisely: 

Theorem 1.3 (Fundamental theorem of arithmetic). Every non-zero integer n 
can be expressed as 

n = ±p { ■ ■ ■ p r , 

where p\, .... p r are distinct primes and e\, ... ,e r are positive integers. Moreover, 
this expression is unique, up to a reordering of the primes. 

Note that if n = ±1 in the above theorem, then r = 0, and the product of zero 
terms is interpreted (as usual) as 1 . 

The theorem intuitively says that the primes act as the “building blocks” out 
of which all non-zero integers can be formed by multiplication (and negation). 
The reader may be so familial - with this fact that he may feel it is somehow “self 
evident,” requiring no proof; however, this feeling is simply a delusion, and most 
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of the rest of this section and the next arc devoted to developing a proof of this 
theorem. We shall give a quite leisurely proof, introducing a number of other very 
important tools and concepts along the way that will be useful later. 

To prove Theorem 1.3, we may clearly assume that n is positive, since otherwise, 
we may multiply n by — I and reduce to the case where n is positive. 

The proof of the existence paid of Theorem 1.3 is easy. This amounts to showing 
that every positive integer n can be expressed as a product (possibly empty) of 
primes. We may prove this by induction on n. If n = 1, the statement is true, as 
n is the product of zero primes. Now let n > 1, and assume that every positive 
integer smaller than n can be expressed as a product of primes. If n is a prime, 
then the statement is true, as n is the product of one prime. Assume, then, that n 
is composite, so that there exist a.be Z with l < a < n, l < b < n, and n = ab. 
By the induction hypothesis, both a and b can be expressed as a product of primes, 
and so the same holds for n. 

The uniqueness paid of Theorem 1.3 is the hard paid. An essential ingredient in 
this proof is the following: 

Theorem 1.4 (Division with remainder property). Let a,b e Z with b > 0. 
Then there exist unique q,r e Z such that a = bq + r and 0 <r< b. 


Proof. Consider the set S of non-negative integers of the form a — bt with t e Z. 
This set is clearly non-empty; indeed, if a > 0, set t := 0, and if a < 0, set t := a. 
Since every non-empty set of non-negative integers contains a minimum, we define 
r to be the smallest element of S. By definition, r is of the form r = a — bq for 
some q e Z, and r > 0. Also, we must have r < b, since otherwise, r — b would be 
an element of S smaller than r, contradicting the minimality of r; indeed, if r > b, 
then we would have 0 < r — b = a — b{q + 1). 

That proves the existence of r and q. For uniqueness, suppose that a = bq + r 
and a = bq' + r' , where 0 < r < b and 0 < r' < b. Then subtracting these two 
equations and rearranging terms, we obtain 

r' — r = b{q — q'). 

Thus, r' — r is a multiple of b\ however, 0 < r < b and 0 < r' < b implies 
| r' — r | < b\ therefore, the only possibility is r' — r = 0. Moreover, 0 = b{q — q') 
and b 0 implies q - q' = 0. □ 

Theorem 1.4 can be visualized as follows: 
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Starting with a, we subtract (or add, if a is negative) the value b until we end up 
with a number in the interval [0, b ). 

Floors and ceilings. Let us briefly recall the usual floor and ceiling functions, 
denoted [-J and [ • | , respectively. These are functions from M (the real numbers) 
to Z. For x e M, [xj is the greatest integer m < x; equivalently, [xj is the unique 
integer m such that m < x < m + 1, or put another way, such that x = m + e for 
some e e [0, 1). Also, [x] is the smallest integer m > x; equivalently, [x] is the 
unique integer m such that m — 1 < x < m, or put another way, such that x = m — e 
for some e e [0, 1). 

The mod operator. Now let a,b e 7L with b > 0. If q and r are the unique integers 
from Theorem 1.4 that satisfy a = bq + r and 0 < r < b, we define 

a mod b := /•; 

that is, a mod b denotes the remainder in dividing a by b. It is clear that b \ a if 
and only if a mod b = 0. Dividing both sides of the equation a = bq + r by b, we 
obtain a/b = q + r/b. Since q e Z and r/b e [0, 1), we see that q = \_a/b\. Thus, 

(i a mod b) = a — b[a/b\. 

One can use this equation to extend the definition of a mod b to all integers a and 
b , with b ^ 0; that is, for b < 0, we simply define a mod b to be a — b[a/b\ . 

Theorem 1 .4 may be generalized so that when dividing an integer a by a positive 
integer b, the remainder is placed in an interval other than [0, b). Let x be any 
real number, and consider the interval [x, x + b). As the reader may easily verify, 
this interval contains precisely b integers, namely, [x] , . . . , [x] + b — 1. Applying 
Theorem 1.4 with a — [x] in place of a, we obtain: 

Theorem 1.5. Let a,be Z with b > 0, and let x e R. Then there exist unique 
q, r e Z such that a = bq + r and r € [x, x + b). 


Exercise 1.1 . Let a,b,de Z with cl 0. Show that a \ b if and only if da \ db. 

Exercise 1.2 . Let n be a composite integer. Show that there exists a prime p 
dividing «, with p < n 1 / 2 . 

Exercise 1.3 . Let m he a positive integer. Show that for every real number x > 1, 
the number of multiples of m in the interval [1, x] is [x/m \ ; in particular, for every 
integer n > 1, the number of multiples of m among is [n/m\ . 

Exercise 1.4 . Let xeR. Show that 2[xJ < [2xJ < 2[xJ + 1. 
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Exercise 1.5 . Let xel and n e Z with n > 0. Show that [ [xj /n\ = \_x/n \ ; in 
particular, [_\_a/b\/c\ = [a/bc\ for all positive integers a. b, c. 

Exercise 1.6. Let n.iieZ with b < 0. Show that (a mod b) e (6,0]. 

Exercise 1.7. Show that Theorem 1.5 also holds for the interval (x, x + b\. Does 
it hold in general for the intervals [x, x + b] or (x, x + 6)? 


1.2 Ideals and greatest common divisors 

To carry on with the proof of Theorem 1.3, we introduce the notion of an ideal of 
Z, which is a non-empty set of integers that is closed under addition, and closed 
under multiplication by an arbitrary integer. That is, a non-empty set I C Z is an 
ideal if and only if for all a, b e I and all je Z, we have 

a + b e I and az £ /. 

Besides its utility in proving Theorem 1.3, the notion of an ideal is quite useful in 
a number of contexts, which will be explored later. 

It is easy to see that every ideal I contains 0: since a e I for some integer a, 
we have 0 = a ■ 0 e I. Also, note that if an ideal I contains an integer a, it also 
contains —a, since —a = a ■ (-1) e I. Thus, if an ideal contains a and 6, it also 
contains a — b. It is clear that {0} and Z are ideals. Moreover, an ideal I is equal 
to Z if and only if 1 e / ; to see this, note that I e / implies that for every z £ Z, 
we have z = 1 ■ z £ I, and hence I = Z; conversely, if I = Z, then in particular, 
1 6 I. 

For a e Z, define aTL := {az : z £ Z}; that is, aZ is the set of all multiples of a. 
If a = 0, then clearly aTL - {0} ; otherwise, aTL consists of the distinct integers 

. . . , — 3a, —2a, —a, 0, a, 2a, 3a, 

It is easy to see that aTL is an ideal: for all az, az! £ aTL and z." £ Z, we have 
az + az' = a{z + z!) e aTL and {az)z" = a{zz!') £ aTL. The ideal aTL is called 
the ideal generated by a, and an ideal of the form aTL for some a e Z is called a 

principal ideal. 

Observe that for all a,b e Z, we have b e «Z if and only if a \ b. Also 
observe that for every ideal I, we have be / if and only if bTL C I. Both of 
these observations are simple consequences of the definitions, as the reader may 
verify. Combining these two observations, we see that bTL C aTL if and only if a \ b. 
Suppose / 1 and Ii are ideals. Then it is not hard to see that the set 


1 1 + I 2 ■— {fl| + fl2 • a 1 e I\, a2 £ I 2 ) 
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is also an ideal. Indeed, suppose a\ + a 2 e I\ + I 2 and b\ + b 2 6 I\ + h- Then we 
have (fli + a 2 ) + ( b\ + b 2 ) = («i + b{) + (02 + b 2 ) e I\ + h. and for every z e Z, 
we have («i + 02)2 = a\Z + a 2 z e I\ + 12- 

Example 1.1. Consider the principal ideal 3Z. This consists of all multiples of 3; 
that is, 3Z = { -9, -6, -3, 0, 3, 6, 9 } . □ 

Example 1.2. Consider the ideal 3Z + 5Z. This ideal contains 3 • 2 + 5 • (-1) = 1. 
Since it contains 1, it contains all integers; that is, 3Z + 5Z = Z. □ 

Example 1.3. Consider the ideal 4Z + 6Z. This ideal contains 4 - ( — 1) + 6-1 =2, 
and therefore, it contains all even integers. It does not contain any odd integers, 
since the sum of two even integers is again even. Thus, 4Z + 6Z = 2Z. □ 

In the previous two examples, we defined an ideal that turned out upon closer 
inspection to be a principal ideal. This was no accident: the following theorem 
says that all ideals of Z arc principal. 

Theorem 1.6. Let I be an ideal of Z. Then there exists a unique non-negative 
integer d such that I = dZ. 

Proof. We first prove the existence paid of the theorem. If I = {0}, then d = 0 
does the job, so let us assume that I ^ {0}. Since I contains non-zero integers, it 
must contain positive integers, since if a e I then so is —a. Let d be the smallest 
positive integer in I. We want to show that I = dZ. 

We first show that I C dZ. To this end, let a be any element in I. It suffices 
to show that d \ a. Using the division with remainder property, write a = dq + r, 
where 0 < r < d. Then by the closure properties of ideals, one sees that r = a — dq 
is also an element of /, and by the minimality of the choice of d, we must have 
r = 0. Thus, d | a. 

We have shown that I C dZ. The fact that dZ C f follows from the fact that 
del. Thus, I = dZ. 

That proves the existence part of the theorem. For uniqueness, note that if 
dZ = eZ for some non-negative integer e , then d \ e and e \ d. from which it 
follows by Theorem 1.2 that d = ±e; since d and e arc non-negative, we must have 
d = e. □ 

Greatest common divisors. For a,be Z, we call d e Z a common divisor of a 
and b if d \ a and d \ b: moreover, we call such a d a greatest common divisor of 
a and b if d is non-negative and all other common divisors of a and b divide d. 

Theorem 1.7. For all a,be Z, there exists a unique greatest common divisor d of 
a and b, and moreover, aZ + bZ = dZ. 
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Proof. We apply the previous theorem to the ideal I := aZ + bZ. Let d e Z with 
I = dZ, as in that theorem. We wish to show that d is a greatest common divisor 
of a and b. Note that a,b,d e I and d is non-negative. 

Since a e I = dZ, we see that d | a; similarly, d \ b. So we see that d is a 
common divisor of a and b. 

Since d e I = aZ + bZ , there exist s, t e Z such that as + bt = d. Now suppose 
a = a! d' and b = b'd' for some a ! , b', d' e Z. Then the equation as + bt = d implies 
that dfa's + b't ) = d, which says that d! \ d. Thus, any common divisor d' of a and 
b divides d. 

That proves that d is a greatest common divisor of a and b. For uniqueness, note 
that if e is a greatest common divisor of a and b. then d \ e and e \ d, and hence 
d = ±e; since both d and e are non-negative by definition, we have d = e. □ 

For a,beZ, we write gcdfa, b) for the greatest common divisor of a and b. We 
say that a,beZ are relatively prime if gcdfa, b) = 1, which is the same as saying 
that the only common divisors of a and b are ±1. 

The following is essentially just a restatement of Theorem 1.7, but we state it 
here for emphasis: 

Theorem 1.8. Let a,b,reZ and let d := gcdfa, b ). Then there exist s,t e Z such 
that as + bt = r if and only if d \ r. In particular, a and b are relatively prime if 
and only if there exist integers s and t such that as + bt = 1 . 

Proof. We have 

as + bt = r for some s,t e Z 
<=> r e aZ + bZ 
<=> r e dZ fby Theorem 1.7) 
d | r. 

That proves the first statement. The second statement follows from the first, setting 
r := 1. □ 

Note that as we have defined it, gcdfO, 0) = 0. Also note that when at least one 
of a or b are non-zero, gcdfa, b ) may be characterized as the largest positive integer 
that divides both a and b, and as the smallest positive integer that can be expressed 
as as + bt for integers s and t. 

Theorem 1.9. Let a, b, c e Z such that c \ ab and gcdfa, c) = 1. Then c \ b. 

Proof. Suppose that c \ ab and gcdfa, c) = 1. Then since gcdfa, c) = 1, by 
Theorem 1.8 we have as + ct = 1 for some s, t e Z. Multiplying this equation by 
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b , we obtain 

abs + cbt = b. (1.1) 

Since c divides ab by hypothesis, and since c clearly divides cbt , it follows that c 
divides the left-hand side of (1.1), and hence that c divides b. □ 

Suppose that p is a prime and a is any integer. As the only divisors of p are ±1 
and ±p, we have 

p | a => gcd(a, p) = p , and 
p \ a => gcd (a,p) = 1. 

Combining this observation with the previous theorem, we have: 

Theorem 1.10. Let p be prime, and let a,beZ. Then p \ ab implies that p \ a or 
P I b. 

Proof Assume that p \ ab. If p \ a, we are done, so assume that p\ a. By the above 
observation, gcd {a,p) = 1, and so by Theorem 1.9, we have p \ b. □ 

An obvious corollary to Theorem 1.10 is that if a \, . . ., a k are integers, and if p 
is a prime that divides the product a\ ■ ■ ■ a k , then p \ a , for some i = This 

is easily proved by induction on k. For k = 1, the statement is trivially true. Now 
let k > 1, and assume that statement holds for k — 1. Then by Theorem 1.10, either 
p | a\ or p | a 2 • ■ ■ a*; if P I at, we are done; otherwise, by induction, p divides one 
of a 2 ,...,a k . 

Finishing the proof of Theorem 1.3. We are now in a position to prove the unique- 
ness part of Theorem 1.3, which we can state as follows: if p \, . . . , p r are primes 

(not necessarily distinct), and q\ q s arc primes (also not necessarily distinct), 

such that 

P\- ■ ■ Pr = q\- ■ ■ q s , (1-2) 

then {p\ p r ) is just a reordering of (r/i , . . . , q s ). We may prove this by induction 

on r. If r = 0, we must have s = 0 and we arc done. Now suppose r > 0. and 
that the statement holds for r — 1. Since r > 0, we clearly must have s > 0. 
Also, as pi obviously divides the left-hand side of (1.2), it must also divide the 
right-hand side of (1.2); that is, p\ \ q\ ■ ■ ■ q s . It follows from (the corollary to) 
Theorem 1.10 that p\ \ qj for some j = 1, . . . , s, and moreover, since qj is prime, 
we must have p\ = q r Thus, we may cancel p\ from the left-hand side of (1.2) 
and qj from the right-hand side of (1.2), and the statement now follows from the 
induction hypothesis. That proves the uniqueness paid of Theorem 1.3. 
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Exercise 1.8 . Let I be a non-empty set of integers that is closed under addition 
(i.e., a + b e I for all a,be I). Show that I is an ideal if and only if —a e I for all 
a e I. 

Exercise 1.9 . Show that for all integers a , b, c, we have: 

(a) gcd (a,b) = gcd (b,a)\ 

(b) gcd(a, b) = \a\ a \ b ; 

(c) gcd(a, 0) = gcd(a, a) = |n| and gcd(a, 1) = 1; 

(d) gcd(ca, cb) = |c| gcd(n, b). 

Exercise 1.10 . Show that for all integers a, b with d := gcd(n, b) ^ 0, we have 
gcd (a/d, b/d) = 1. 

Exercise 1.11 . Let n be an integer. Show that if a, b are relatively prime integers, 
each of which divides n , then ab divides n. 

Exercise 1.12 . Show that two integers are relatively prime if and only if there is 
no one prime that divides both of them. 

Exercise 1.13 . Let a, b \, . . . , bk be integers. Show that gcd(n, b\ ■ ■ ■ b k) = 1 if 
and only if gcd(a, b ,) = 1 for /' = 1, . . . , k. 

Exercise 1.14 . Let p be a prime and k an integer, with 0 < k < p. Show that the 
binomial coefficient 

( p \ = _P: , 

\k) k\(p — k)\ 

which is an integer (see §A2), is divisible by p. 

Exercise 1.15 . An integer a is called square-free if it is not divisible by the 
square of any integer greater than 1. Show that: 

(a) a is square-free if and only if a = ±p\ ■ ■ ■ p r , where the p,’s arc distinct 
primes; 

(b) every positive integer n can be expressed uniquely as n = ab 2 , where a and 
b are positive integers, and a is square-free. 

Exercise 1.16. For each positive integer m, let I m denote {0 , — 1}. Let 
a, b be positive integers, and consider the map 

T . Ijj X I a —r I a ij 

( s , t) i — r (as + bt ) mod ab. 

Show t is a bijection if and only if gcd(n, b) = 1. 
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Exercise 1.17. Let a,b,c be positive integers satisfying gcd(n, b) = 1 and 
c > (a — I )(b — 1). Show that there exist non-negative integers s,t such that 
c = as + bt. 

Exercise 1.18. For each positive integer n, let D n denote the set of positive 
divisors of n. Let n\,ni be relatively prime, positive integers. Show that the sets 
D, n x D„, and D, n „ 2 arc in one-to-one correspondence, via the map that sends 
(cl[,d 2 ) € D n i x D„ 2 to d\d 2 - 


1.3 Some consequences of unique factorization 

The following theorem is a consequence of just the existence part of Theorem 1.3: 

Theorem 1.11. There are infinitely many primes. 

Proof. By way of contradiction, suppose that there were only finitely many primes; 
call them p\,...,pk. Then set M := ]^J =| p, and N := M + 1. Consider a prime 
p that divides N. There must be at least one such prime p, since N > 2, and 
every positive integer can be written as a product of primes. Clearly, p cannot 
equal any of the pf s, since if it did, then p would divide M, and hence also divide 
N — M = 1, which is impossible. Therefore, the prime p is not among pi,...,Pk, 
which contradicts our assumption that these are the only primes. □ 

For each prime p, we may define the function v p , mapping non-zero integers to 
non-negative integers, as follows: for every integer n ^ 0, if n = p e m, where p \ m, 
then Vp(n) := e. We may then write the factorization of n into primes as 

n = ± n P Vp{n \ 

p 

where the product is over all primes p\ although syntactically this is an infinite 
product, all but finitely many of its terms are equal to 1, and so this expression 
makes sense. 

Observe that if a and b are non-zero integers, then 

v p (a ■ b) = Vp(a) + v p {b) for all primes p, (1.3) 

and 

a | b Vp(a) < v p (b) for all primes p. (1.4) 


gcd (a,b) = Y[p m “ (v ' (fl) -vi> (6)) . 

p 


From this, it is clear that 
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Least common multiples. For a, b e Z, a common multiple of a and b is an 

integer m such that a \ m and b \ m\ moreover, such an m is the least common 
multiple of a and b if m is non-negative and m divides all common multiples of 
a and b. It is easy to see that the least common multiple exists and is unique, 
and we denote the least common multiple of a and b by lcm(a, b). Indeed, for all 
a,be Z, if either a or b arc zero, the only common multiple of a and b is 0, and so 
lcm(n, b ) = 0; otherwise, if neither a nor b are zero, we have 

lcm(o, b) = Yl p max(Vp(a )’ v?(6) ), 

p 

or equivalently, lcm(n, b ) may be characterized as the smallest positive integer 
divisible by both a and b. 

It is convenient to extend the domain of definition of v p to include 0, defining 
v p ( 0) := oo. If we interpret expressions involving “oo” appropriately (see Prelimi- 
naries), then for arbitrary a,be Z, both (1.3) and (1.4) hold, and in addition, 

v p (gcd(n, b)) = min(v p (a), v p (b)) and v p (lcm (a, b )) = max(v p (n), v p (b)) 

for all primes p. 

Generalizing gcd’s and lcm’s to many integers. It is easy to generalize the 
notions of greatest common divisor and least common multiple from two integers 
to many integers. Let a\,...,ak be integers. We call d e Z a common divisor 
of a\, . . . , ak if d | a i for i = 1 moreover, we call such a d the greatest 

common divisor of a \, . . . , if d is non-negative and all other common divi- 
sors of «i, . . . , a* divide d. The greatest common divisor of a\, . . . , a k is denoted 
gcd(«i, . . . , n/c) and is the unique non-negative integer d satisfying 

v p (d ) = min(v p (ai), . . . , v p (ak)) for all primes p. 

Analogously, we call me Z a common multiple of a\, . . . , ak if a, | m for all 
i = 1, . . . , k\ moreover, such an m is called the least common multiple of a \, . . . , ak 

if m divides all common multiples of a\ a*. The least common multiple of 

a \, . . . , ak is denoted lcm(ai, . . . , ak) and is the unique non-negative integer m sat- 
isfying 

v p (m ) = max(v p (ai), . . . , v p (ak)) for all primes p. 

Finally, we say that the family { a, }| ( =1 is pairwise relatively prime if for all indices 
i,j with i f j, we have gcd(r;,, a f) = 1. Certainly, if { } f =1 is pairwise relatively 
prime, and k > 1 , then gcd(ai, . . . , ak) = 1 ; however, gcd(«i, . . . , ak) = 1 does not 
imply that {«/}f =1 is pairwise relatively prime. 
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Rational numbers. Consider the rational numbers Q = {a/h : a, h e Z, b* 0}. 
Given any rational number a/b , if we set d := gcd(a, 6), and define the integers 
no := a/d and £>o := /? / d, then we have a//) = ao/Z>o and gcd(no.^o) = 1. More- 
over, if a\/b\ = ao/b( h then we have a \ bo = aob \ , and so Z>o | a<)b \ ; also, since 
gcd(flo. do) = 1, we see that bo \ bp, writing b\ = boc, we see that a\ = aoc. Thus, 
we can represent every rational number as a fraction in lowest terms, which means 
a fraction of the form ao/bo where ao and bo arc relatively prime; moreover, the 
values of ao and bo are uniquely determined up to sign, and every other fraction 
that represents the same rational number is of the form aoc / boc, for some non-zero 
integer c. 

Exercise 1.19 . Let n be an integer. Generalizing Exercise 1.11, show that if 
{ a , } k j= j is a pairwise relatively prime family of integers, where each n, divides n, 
then their product JJ* =1 a, also divides n. 

Exercise 1.20 . Show that for all integers a, b, c, we have: 

(a) lcm(a, b) = lcm(d, a)\ 

(b) lcm {a,b) = |n| <=> b \ a\ 

(c) lcm(n, a) = lcm(a, 1) = |a|; 

(d) lcm (ca,cb) = |c| lcm(a, b). 

Exercise 1.21 . Show that for all integers a, b, we have: 

(a) gcd(a, b) ■ lcm(a, b) = \ab[, 

(b) gcd(a, b) = 1 => lcm(n, b) = \ab\. 

Exercise 1.22 . Let a\, . . . , ak eZ with k > 1. Show that: 

gcd(ai,...,a*) = gcd(ai,gcd(a 2 ,...,«fc)) = gcd(gcd(ai, . . . , a k -\), a k ); 
lcm(ai, . . . , ak) = lcm(ai, lcm(a 2 , .... ak)) = lcm(lcm(ai, . . . , ak- i), ak). 

Exercise 1.23 . Let a \, . . . , a* e Z with d := gcd(ai ak). Show that dZ = 

a 1 7L + ■ ■ ■ + akfL\ in particular, there exist integers zi, ■ ■ ■ , Zk such that d = a\Z\ + 
■■■ + a k Zk- 

Exercise 1.24 . Show that if { a , } ') =] is a pairwise relatively prime family of inte- 
gers, then lcm(ai, . . . , ak) = |ai * • ■ ak\. 

Exercise 1.25 . Show that every non-zero xeQ can be expressed as 

, e\ e r 

X = ±P l ■■■Pr , 

where the pf s are distinct primes and the e,-’s arc non-zero integers, and that this 
expression in unique up to a reordering of the primes. 
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Exercise 1.26 . Let n and k be positive integers, and suppose x e Q such that 
x k = n for some x e O. Show that x 6 Z. In other words, ffn is either an integer 
or is irrational. 

Exercise 1.27. Show that gcd(n + b, lcm(a, b)) = gcd(n, b) for all a, b e Z. 

Exercise 1.28. Show that for every positive integer k, there exist k consecutive 
composite integers. Thus, there arc arbitrarily large gaps between primes. 

Exercise 1 .29. Let p be a prime. Show that for all a,b e Z, we have v p (a + b) > 
min{v p (a), v p (b)}, and v p {a + b) = v p (a) if v p (a) < v p (b). 

Exercise 1.30. For a given prime p, we may extend the domain of definition of 
v p from Z to ©: for non-zero integers a, b , let us define v p (a/b) := v p {a) — v p (b). 
Show that: 

(a) this definition of v p (a/b ) is unambiguous, in the sense that it does not 
depend on the particular choice of a and b\ 

(b) for all x, y 6 Q, we have v p (xy) = v p (x) + v p {y)\ 

(c) for all xjeQ, we have v p (x + y) > min{v p (x), v p (y)}, and v p {x + y) = 
v p (x) if v p {x) < v p (y ); 

(d) for all non-zero x e Q, we have x = ± p' lp(x> , where the product is over 
all primes, and all but a finite number of terms in the product are equal to 1 ; 

(e) for all x e Q, we have x e Z if and only if v p (x) > 0 for all primes p. 

Exercise 1.31. Let n be a positive integer, and let 2 k be the highest power of 2 
in the set S := { 1 Show that 2 k does not divide any other element in S. 

Exercise 1.32. Let n e Z with n > 1. Show that X/Li 1/ i i s not an integer. 

Exercise 1.33. Let n be a positive integer, and let C„ denote the number of pairs 
of integers ( a , b) with a,be { 1 ,...,«} and gcd(n, b) = 1, and let F n be the number 
of distinct rational numbers a/b. where 0 < a < b < n. 

(a) Show that F n = (C„ + l)/2. 

(b) Show that C n > n~ / 4. Hint: first show that C„ > n 2 ( I - ^j d> i 1/d 2 ), and 
then show that ^ rf>2 1 /dr < 3/4. 

Exercise 1.34. This exercise develops a characterization of least common mul- 
tiples in terms of ideals. 

(a) Arguing directly from the definition of an ideal, show that if I and J are 
ideals of Z, then so is I n /. 

(b) Let a, b e Z, and consider the ideals I := aZ and J := FL. By paid 
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(a), we know that I n J is an ideal. By Theorem 1.6, we know that 
I fl / = mZ for some uniquely determined non-negative integer m. Show 
that m = lcm(n, b). 



2 

Congruences 


This chapter introduces the basic properties of congruences modulo n, along with 
the related notion of residue classes modulo n. Other items discussed include the 
Chinese remainder theorem, Euler’s phi function, Euler’s theorem, Fermat’s little 
theorem, quadratic residues, and finally, summations over divisors. 


2.1 Equivalence relations 

Before discussing congruences, we review the definition and basic properties of 
equivalence relations. 

Let S be a set. A binary relation ~ on S is called an equivalence relation if it is 
reflexive: Cl a for all a e S, 
symmetric: a ~ b implies b ~ a for all a. b e S, and 
transitive: Cl r*j b and b r*j c implies a c for all a,b,ce S. 

If ~ is an equivalence relation on 5, then for a e S one defines its equivalence 
class as the set { x e S : x ~ a } . 

Theorem 2.1. Let ~ be an equivalence relation on a set S, and for a e .S', let [a\ 
denote its equivalence class. Then for all a,beS, we have: 

(i) a e [a]; 

(ii) a e \b\ implies [a] = \b\. 

Proof, (i) follows immediately from reflexivity. For (ii), suppose a e [b], so that 
a ~ b by definition. We want to show that [a] = [t>]. To this end, consider any 
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x e S. We have 

x ~ a (by definition) 

x ~ b (by transitivity, and since x ~ a and a ~ h) 
x e [b]. 

Thus, \a\ C [/;]. By symmetry, we also have b ~ a. and reversing the roles of a and 
b in the above argument, we see that [ b] C \a\. □ 

This theorem implies that each equivalence class is non-empty, and that each 
element of S belongs to a unique equivalence class; in other words, the distinct 
equivalence classes form a partition of S (see Preliminaries). A member of an 
equivalence class is called a representative of the class. 


x e [n] 


Exercise 2.1. Consider the relations =, <, and < on the set M. Which of these 
arc equivalence relations? Explain your answers. 

Exercise 2.2. Let S := (R x 1) \ {(0,0)}. For (x,y),(x',y') e S, let us say 
(x,y) r*j (x', y r ) if there exists a real number A > 0 such that (x, y) = (Ax', Ay'). 
Show that ~ is an equivalence relation; moreover, show that each equivalence class 
contains a unique representative that lies on the unit circle (i.e., the set of points 
(x, y) such that x 2 + y 2 = 1). 


2.2 Definitions and basic properties of congruences 

Let n be a positive integer. For integers a and b, we say that a is congruent to b 
modulo n if n \ (a — b), and we write a = b (mod n). If n\ {a — b), then we write 
a ^ b (mod n). Equivalently, a = b (mod n) if and only if a = b + ny for some 
y g Z. The relation a = b (mod n ) is called a congruence relation, or simply, a 
congruence. The number n appealing in such congruences is called the modulus 
of the congruence. This usage of the “mod” notation as paid of a congruence is not 
to be confused with the “mod” operation introduced in § 1. 1. 

If we view the modulus n as fixed, then the following theorem says that the 
binary relation “• = • (mod «)” is an equivalence relation on the set Z. 

Theorem 2.2. Let n be a positive integer. For all a. b. c e Z, we have: 

(i) a = a (mod n); 

(ii) a = b (mod n) implies b = a (mod n); 

(iii) a = b (mod n ) and b = c (mod n ) implies a = c (mod n). 

Proof. For (i), observe that n divides 0 = a — a. For (ii), observe that if n divides 
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a — b, then it also divides -( a — b) = b - a. For (iii), observe that if n divides a — b 
and b — c, then it also divides (a — b) + (b — c) = a — c. □ 

Another key property of congruences is that they are “compatible” with integer 
addition and multiplication, in the following sense: 

Theorem 2.3. Let a, a', b, b' , n e Z with n > 0. If 

a = a' (mod n ) and b = b' (mod n), 

then 

a + b = a' + b 1 (mod n ) and a ■ b = a' ■ b' (mod n). 

Proof. Suppose that a = a.' (mod n) and b = b' (mod n). This means that there 
exist integers x and y such that a = a' + nx and b = b' + ny. Therefore, 

a + b = a' + b' + n(x + y), 

which proves the first congruence of the theorem, and 

ab = ( a ' + nx)(b' + ny) = a!b' + n(a'y + b'x + nxy), 

which proves the second congruence. □ 

Theorems 2.2 and 2.3 allow one to work with congruence relations modulo n 
much as one would with ordinary equalities: one can add to, subtract from, or 
multiply both sides of a congruence modulo n by the same integer; also, if b is 
congruent to a modulo n, one may substitute b for a in any simple arithmetic 
expression (involving addition, subtraction, and multiplication) appealing in a con- 
gruence modulo n. 

Now suppose a is an arbitrary, fixed integer, and consider the set of integers z 
that satisfy the congruence z = a (mod n). Since z satisfies this congruence if 
and only if z = a + ny for some ye Z, we may apply Theorems 1.4 and 1.5 
(with a as given, and b := n) to deduce that every interval of n consecutive integers 
contains exactly one such z- This simple fact is of such fundamental importance 
that it deserves to be stated as a theorem: 

Theorem 2.4. Let a,n e 7L with n > 0. Then there exists a unique integer z such 
that z = a (mod n) and 0 < z < n, namely, z := a mod n. More generally, for 
every x e M, there exists a unique integer z. e [x, x + n) such that z = a (mod n). 

Example 2.1. Let us find the set of solutions z to the congruence 


3z + 4 = 6 (mod 7). 


( 2 . 1 ) 
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Suppose that z is a solution to (2.1). Subtracting 4 from both sides of (2.1), we 
obtain 

3^ = 2 (mod 7). (2.2) 

Next, we would like to divide both sides of this congruence by 3, to get z by itself 
on the left-hand side. We cannot do this directly, but since 5-3 = 1 (mod 7), we 
can achieve the same effect by multiplying both sides of (2.2) by 5. If we do this, 
and then replace 5 • 3 by 1, and 5 • 2 by 3, we obtain 

z = 3 (mod 7). 

Thus, if z is a solution to (2.1), we must have z = 3 (mod 7); conversely, one can 
verify that if z = 3 (mod 7), then (2.1) holds. We conclude that the integers z that 
are solutions to (2.1) are precisely those integers that are congruent to 3 modulo 7, 
which we can list as follows: 

...,-18,-11,-4,3,10.17,24,... □ 

In the next section, we shall give a systematic treatment of the problem of solving 
linear congruences, such as the one appealing in the previous example. 


Exercise 2.3 . Let a,b,n e Z with n > 0. Show that a = b (mod n) if and only if 
(a mod n) = ( b mod n). 

Exercise 2.4 . Let a,b,n e Z with n > 0 and a = b (mod n). Also, let 
Co, ci, . . . , Ck e Z. Show that 

Co + cjfl + • • • + Cka k = co + c\b + ■ ■ ■ + Ckb k (mod ri). 

Exercise 2.5 . Let a , b , n, n 1 e Z with n > 0, n' > 0, and n' \ n. Show that if 
a = b (mod n), then a = b (mod n'). 

Exercise 2.6 , Let a,b,n,n' e Z with n > 0, n! > 0, and gcd(«, «') = 1. Show 
that if a = b (mod n ) and a = b (mod n'), then a = b (mod nn'). 

Exercise 2.1 . Let a, b,n e Z with n > 0 and a = b (mod n). Show that 
gcd (a,n) = gcd (b,n). 

Exercise 2.8. Let a be a positive integer whose base-10 representation is a = 
{dk - 1 • • • flino)to- Let b be the sum of the decimal digits of a\ that is, let b := 

ao + a\ -\ 1- a*_i . Show that a = b (mod 9). From this, justify the usual “rules of 

thumb” for determining divisibility by 9 and 3: a is divisible by 9 (respectively, 3) 
if and only if the sum of the decimal digits of a is divisible by 9 (respectively, 3). 
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Exercise 2.9. Let e be a positive integer. For a e {0 2 e — 1 } , let a denote 

the integer obtained by inverting the bits in the e-bit, binary representation of a 
(note that a e {0, . . . ,2 e — 1}). Show that a + 1 = —a (mod 2 e ). This justifies the 
usual rule for computing negatives in 2’s complement arithmetic (which is really 
just arithmetic modulo 2 e ). 

Exercise 2.10. Show that the equation 7y 3 + 2 = z 3 has no solutions y, z e Z. 

Exercise 2.11. Show that there are 14 distinct, possible, yearly (Gregorian) 
calendars, and show that all 14 calendars actually occur. 


2.3 Solving linear congruences 

In this section, we consider the general problem of solving linear congruences. 
More precisely, for a given positive integer n, and arbitrary integers a and b, we 
wish to determine the set of integers z that satisfy the congruence 

az = b (mod n). (2.3) 

Observe that if (2.3) has a solution z, and if z = z! (mod n), then z! is also a 
solution to (2.3). However, (2.3) may or may not have a solution, and if it does, 
such solutions may or may not be uniquely determined modulo n. The following 
theorem precisely characterizes the set of solutions of (2.3); basically, it says that 
(2.3) has a solution if and only if d := gcd(a, n) divides b, in which case the 
solution is uniquely determined modulo n /d. 

Theorem 2.5. Let a,n e Z with n > 0, and let d := gcd(a, n). 

(i) For every be Z, the congruence az = b (mod n ) has a solution z € Z if 
and only if d \ b. 

(ii) For every z e Z, we have az = 0 (mod n) if and only if z. = 0 (mod n/d). 

(iii) For all z, z! e Z, we have az = az' (mod n) if and only if z = z' (mod n/d). 

Proof. For (i), let b e Z be given. Then we have 
az = b (mod n) for some z € Z 

az = b + ny for some z,y£ Z (by definition of congruence) 

<=> az - ny = b for some z, y e Z 
<=> d | b (by Theorem 1.8). 

For (ii), we have 

n | az n/d \ ( a/d)z n/d | z. 

All of these implications follow rather trivially from the definition of divisibility. 
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except that for the implication n/d | ( a/d)z => n/d \ z, we use Theorem 1.9 
and the fact that gcd (a/d, n/d ) = 1. 

For (iii), we have 

az = az ' (mod n) a{z - z!) = 0 (mod n) 

<=> z - z' = 0 (mod n/d) (by part (ii)) 

<=> z = z' (mod n/d). □ 

We can restate Theorem 2.5 in more concrete terms as follows. Let a,n e Z 
with n > 0, and let d := gcd (a,n). Let I n := {0, — 1} and consider the 
“multiplication by a” map 

Za ■ In f In 

z* az mod n. 

The image of r a consists of the n/d integers 

i ■ d (i = 0 n/d — 1). 

Moreover, every element b in the image of r a has precisely d pre-images 
Zo + j ■ {n/d) (j = 0, . . .,d- 1), 

where zo £ {0 .n/d — 1 }. In particular, r„ is a bijection if and only if a and n 

are relatively prime. 

Example 2.2. The following table illustrates what Theorem 2.5 says for n = 15 
and a = 1,2, 3, 4, 5, 6. 


z 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

2 z mod 15 

0 

2 

4 

6 

8 

10 

12 

14 

1 

3 

5 

7 

9 

11 

13 

3 z mod 15 

0 

3 

6 

9 

12 

0 

3 

6 

9 

12 

0 

3 

6 

9 

12 

4 z mod 15 

0 

4 

8 

12 

1 

5 

9 

13 

2 

6 

10 

14 

3 

7 

11 

5 z mod 15 

0 

5 

10 

0 

5 

10 

0 

5 

10 

0 

5 

10 

0 

5 

10 

6 z mod 15 

0 

6 

12 

3 

9 

0 

6 

12 

3 

9 

0 

6 

12 

3 

9 


In the second row, we are looking at the values 2 z mod 15, and we see that this 
row is just a permutation of the first row. So for every b , there exists a unique z 
such that 2z = b (mod 15). This is implied by the fact that gcd(2, 15) = 1. 

In the third row, the only numbers hit are the multiples of 3, which follows from 
the fact that gcd(3, 15) = 3. Also note that the pattern in this row repeats every five 
columns; that is, 3 z = 3 z! (mod 15) if and only if z = z! (mod 5). 

In the fourth row, we again see a permutation of the first row, which follows 
from the fact that gcd(4, 15) = 1. 
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In the fifth row, the only numbers hit arc the multiples of 5, which follows from 
the fact that gcd(5, 15) = 5. Also note that the pattern in this row repeats every 
three columns; that is, 5z = 5z' (mod 15) if and only if z = z' (mod 3). 

In the sixth row, since gcd(6, 15) = 3, we see a permutation of the third row. 
The pattern repeats after five columns, although the pattern is a permutation of the 
pattern in the third row. □ 

We develop some further consequences of Theorem 2.5. 

A cancellation law. Let a,ne Z with n > 0. Paid (iii) of Theorem 2.5 gives us a 
cancellation law for congruences: 

if gcd(a, n) = 1 and az = az ' (mod n ), then z = z' (mod n). 

More generally, if d := gcd (a,n), then we can cancel a from both sides of a con- 
gruence modulo n. as long as we replace the modulus by n/d. 

Example 2.3. Observe that 

5-2 = 5- (-4) (mod 6). (2.4) 

Paid (iii) of Theorem 2.5 tells us that since gcd(5,6) = 1, we may cancel the 
common factor of 5 from both sides of (2.4), obtaining 2 = -4 (mod 6), which 
one can also verify directly. 

Next observe that 

15 • 5 = 15 • 3 (mod 6). (2.5) 

We cannot simply cancel the common factor of 15 from both sides of (2.5); indeed, 
5^3 (mod 6). However, gcd(15, 6) = 3, and as paid (iii) of Theorem 2.5 guaran- 
tees, we do indeed have 5 = 3 (mod 2). □ 

Modular inverses. Again, let a, n e Z with n > 0. We say that z £ Z is a 
multiplicative inverse of a modulo n if az = 1 (mod n). Part (i) of Theorem 2.5 
says that a has a multiplicative inverse modulo n if and only if gcd (a,n) = 1. 
Moreover, paid (iii) of Theorem 2.5 says that the multiplicative inverse of a, if 
it exists, is uniquely determined modulo n: that is, if 2 and z! are multiplicative 
inverses of a modulo n, then z = z! (mod n). Note that if z is a multiplicative 
inverse of a modulo n, then a is a multiplicative inverse of z modulo n. Also note 
that if a = a! (mod «), then z is a multiplicative inverse of a modulo n if and only 
if z is a multiplicative inverse of a' modulo n. 

Now suppose that a.b.ne Z with n > 0, a -f 0, and gcd(a, n) = 1. Theorem 2.5 
says that there exists a unique integer z satisfying 


az = b (mod n) and 0 < z < n. 
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Setting s := b/ a e Q, we may generalize the “mod” operation, defining s mod 
n to be this value z. As the reader may easily verify, this definition of s mod n 
does not depend on the particular choice of fraction used to represent the rational 
number s. With this notation, we can simply write a -1 mod n to denote the unique 
multiplicative inverse of a modulo n that lies in the interval 0 1 . 

Example 2.4. Looking back at the table in Example 2.2, we see that 

2 _1 mod 15 = 8 and 4 _1 mod 15 = 4, 

and that neither 3, 5, nor 6 have modular inverses modulo 15. □ 

Example 2.5. Let a,b,n e Z with n > 0. We can describe the set of solutions z e 7L 
to the congruence az = b (mod n) very succinctly in terms of modular inverses. 

If gcd(n, n) = 1, then setting t := n _1 mod n, and zo •= tb mod n, we see that 
Zo is the unique solution to the congruence az = b (mod n ) that lies in the interval 
{0 n - 1}. 

More generally, if d := gcd(a, n), then the congruence az = b (mod n) has 
a solution if and only if d \ b. So suppose that d \ b. In this case, if we set 
a' := a/d, b' := b/d, and n ' := n/d , then for each z £ Z, we have az = b (mod n) 
if and only if a! z = b’ (mod n’). Moreover, gcd(a',n') = 1, and therefore, if 
we set t := (a') -1 mod n’ and zo : = tb’ mod n', then the solutions to the con- 
gruence az = b (mod n ) that lie in the interval {0 n — 1} are the d integers 

zo, zo + n', ...,zo + (d~ !)«'■ □ 


Exercise 2.12 . Let au . . . , a^, b, n be integers with n > 0. Show that the con- 
gruence 

a\Z\ 4 1- a k Zk = b (mod n) 

has a solution zi .... ,z k e Z if and only if d \ b, where d := gcd(ai , ... ,a k , n). 


Exercise 2.13. Let p be a prime, and let a , b , c, e be integers, such that e > 0, 
a ^ 0 (mod p e+l ), and 0 < c < p e . Define N to be the number of integers 
z e {0, p 2e - 1 } such that 


((az + b) mod p 2e ^j j i 


= c. 


Show that N = p e . 


2.4 The Chinese remainder theorem 

Next, we consider systems of linear congruences with respect to moduli that arc 
relatively prime in pairs. The result we state here is known as the Chinese remain- 
der theorem, and is extremely useful in a number of contexts. 
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Theorem 2.6 (Chinese remainder theorem). Let {«/}f =1 be a pairwise relatively 
prime family of positive integers, and let a \, . . . , be arbitrary integers. Then 
there exists a solution a e Z to the system of congruences 

a = cii (mod «,) (/ = 1, . . . , k). 

Moreover, any a' e X is a solution to this system of congruences if and only if 
a = a' (mod n), where n := nf =1 

Proof. To prove the existence of a solution a to the system of congruences, we first 
show how to construct integers e \ , . . . , e* such that for i,j = 1, .... k, we have 

= / 1 (mod m) if j = i, 

ej ~ \ 0 (mod n t ) if jfi. [ ' 

If we do this, then setting 

k 

a := ^ aid, 

i=t 

one sees that for j = 1, . . . , k, we have 

k 

d = ^ djCj = dj (mod nj), 

/=i 

since all the terms in this sum arc zero modulo « 7 , except for the term i = j, which 
is congruent to dj modulo ttj. 

To construct e \, . . . , satisfying (2.6), let n := ]”[f =1 n, as in the statement of 
the theorem, and for i = 1 ,... ,k, let n* := «/«,; that is, n* is the product of all 
the moduli with j f i. From the fact that {«,- }* =1 is pairwise relatively prime, 
it follows that for i = 1, . . . , k, we have gcd(«,, n*) = 1, and so we may define 
tj := (n*)~ l mod n, and e,- := n*t One sees that e, = 1 (mod «,), while for j i, 
we have n , | n*, and so ej = 0 (mod «,). Thus, (2.6) is satisfied. 

That proves the existence of a solution d to the given system of congruences. If 
d = d' (mod n ), then since n ( - | n for / = 1, . . . , k, we see that d = a = a,- (mod «, ) 
for / = 1, . . . , k, and so d also solves the system of congruences. 

Finally, if d is a solution to the given system of congruences, then d = d t = 
d (mod «,) for i = 1, . . . , k. Thus, n, | ( a - d) for i = 1, . . . , k. Since {«, }f =1 is 
pairwise relatively prime, this implies n \ (d—d), or equivalently, a = d (mod n). □ 

We can restate Theorem 2.6 in more concrete terms, as follows. For each positive 
integer m, let I m denote {0 , ... ,m - 1}. Suppose {n,} fc =1 is a pairwise relatively 
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prime family of positive integers, and set n := n\ ■ ■ ■ n^. Then the map 

T . In Ini * " ' ' X I nk 

a \-> (a mod n\, . . . , a mod «/,) 


is a bijection. 


Example 2.6. The following table illustrates what Theorem 2.6 says for n\ = 3 
and n 2 = 5. 


a 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

a mod 3 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

0 

1 

2 

a mod 5 

0 

1 

2 

3 

4 

0 

1 

2 

3 

4 

0 

1 

2 

3 

4 


We see that as a ranges from 0 to 14, the pairs (a mod 3, a mod 5) range over 

all pairs (a\, 02 ) with a\ e {0, 1, 2} and «2 £ {0 4} , with every pair being hit 

exactly once. □ 


Exercise 2.14. Compute the values e\, e2, e 3 in the proof of Theorem 2.6 in the 
case where k = 3, n\ = 3, m = 5, and = 7. Also, find an integer a such that 
a = 1 (mod 3), a = -1 (mod 5), and a = 5 (mod 7). 

Exercise 2.15. If you want to show that you arc a real nerd, here is an age- 
guessing game you might play at a party. You ask a fellow party-goer to divide his 
age by each of the numbers 3, 4, and 5, and tell you the remainders. Show how to 
use this information to determine his age. 

Exercise 2.16. Let {«, }* =] be a pairwise relatively prime family of positive 

integers. Let a\ a/- and b\, ... ,bk be integers, and set d, := gcd(a,-,n i ) for 

i = 1 .... ,k. Show that there exists an integer z such that a t z = bj (mod «,) for 
i = \, ... ,k if and only if d, \ bi for i = 

Exercise 2.17. For each prime p, let v p (-) be defined as in §1.3. Let pi p r 

be distinct primes, ci\, . . . ,a r be arbitrary integers, and e\, . . . , e r be arbitrary non- 
negative integers. Show that there exists an integer a such that v Pi ( a — a,) = e, for 
i = 1 ,...,r. 

Exercise 2. 18. Suppose n\ and ng are positive integers, and let d := gcd(ni, « 2 )- 
Let a 1 and 02 be arbitrary integers. Show that there exists an integer a such that 
a = a\ (mod n\) and a = a 2 (mod ^2) if and only if a\ = «2 (mod d ). 
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2.5 Residue classes 

As we already observed in Theorem 2.2, for any fixed positive integer n, the binary 
relation “• = ■ (mod «)” is an equivalence relation on the set Z. As such, this 
relation partitions the set Z into equivalence classes. We denote the equivalence 
class containing the integer a by \a\„. and when n is clear from context, we simply 
write [a]. By definition, we have 

2 e [a] z = a (mod n) <=> z = a + ny for some ye Z, 

and hence 

[a] = a + nl := {a + ny : y e 1} . 

Historically, these equivalence classes are called residue classes modulo n, and we 
shall adopt this terminology here as well. Note that a given residue class modulo n 
has many different “names”; for example, the residue class \n — 1] is the same as 
the residue class [-1]. Any member of a residue class is called a representative 
of that class. 

We define Z„ to be the set of residue classes modulo n. The following is simply 
a restatement of Theorem 2.4: 

Theorem 2.7. Let n be a positive integer. Then Z„ consists of the n distinct residue 
classes [0], [1 ],..., [n — 1], Moreover, for every x e R, each residue class modulo 
n contains a unique representative in the interval [x, x + n). 

When working with residue classes modulo n, one often has in mind a partic- 
ular set of representatives. Typically, one works with the set of representatives 
{0,1 1 } . However, sometimes it is convenient to work with another set 
of representatives, such as the representatives in the interval [—n/2,n/2). In this 
case, if n is odd, we can list the elements of Z„ as 

[-(« - l)/2], • • • , [-1], [0], [1], ...,[(«- l)/2], 
and when n is even, we can list the elements of Z„ as 

[-n/2] [-1], [0], [1] [n/2 - 1]. 

We can “equip” Z„ with binary operations defining addition and multiplication 
in a natural way as follows: for a,b e Z, we define 

[a] + [ b ] := [a + b ], 

M • [b] := [a • b\. 

Of course, one has to check that this definition is unambiguous, in the sense that 
the sum or product of two residue classes should not depend on which particular 



26 


Congruences 


representatives of the classes arc chosen in the above definitions. More precisely, 
one must check that if [a] = [a'] and [A] = [A'], then [a + b] = [ a ' + b '] and 
[a ■ b\ = [ a' ■ b']. However, this property follows immediately from Theorem 2.3. 
Observe that for all a, b,c e Z, we have 

[a] + [&] = [c] <=> a + b = c (mod «), 

and 

[a] • [Z>] = [c] a ■ b = c (mod n), 

Example 2.7. Consider the residue classes modulo 6. These are as follows: 

[ 0 ] = {...,- 12 ,- 6 , 0 , 6 , 12 ,...} 

[1] = {...,-11,-5,1,7,13,...} 

[2] = {...,-10,-4,2,8,14,...} 

[3] = {...,-9,-3, 3,9, 15,...} 

[4] = {...,-8, -2, 4, 10, 16,...} 

[5] = {...,-7,-1,5,11,17,...} . 

Let us write down the addition and multiplication tables for 7L^. The addition table 
looks like this: 


+ 

[0] 

[1] 

[2] 

[3] 

[4] 

[5] 

[0] 

[0] 

[1] 

[2] 

[3] 

[4] 

[5] 

[1] 

[1] 

[2] 

[3] 

[4] 

[5] 

[0] 

[2] 

[2] 

[3] 

[4] 

[5] 

[0] 

[1] 

[3] 

[3] 

[4] 

[5] 

[0] 

[1] 

[2] 

[4} 

[4] 

[5] 

[0] 

[1] 

[2] 

[3] 

[5] 

[5] 

[0] 

[1] 

[2] 

[3] 

[4]. 


The multiplication table lo< 

iks like this: 

[0] [1] 

[2] 

[3] 

[4] 

[5] 

[0] 

[0] 

[0] 

[0] 

[0] 

[0] 

[0] 

[1] 

[0] 

[1] 

[2] 

[3] 

[4] 

[5] 

[2] 

[0] 

[2] 

[4] 

[0] 

[2] 

[4] 

[3] 

[0] 

[3] 

[0] 

[3] 

[0] 

[3] 

[4] 

[0] 

[4] 

[2] 

[0] 

[4] 

[2] 

[5] 

[0] 

[5] 

[4] 

[3] 

[2] 

[1] • 


Instead of using representatives in the interval [0,6), we could just as well use 
representatives from another interval, such as [-3, 3). Then, instead of naming the 
residue classes [0], [1], [2], [3], [4], [5], we would name them [-3], [-2], [-1], 
[0), [1], [2], Observe that [-3] = [3], [-2] = [4], and [-1] = [5], □ 
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These addition and multiplication operations on Z„ yield a very natural algebraic 
structure. For example, addition and multiplication are commutative and associa- 
tive; that is, for all a, p,y e Z„, we have 

a + P = P + a, (a + /?) + y = a + (/? + y), 
aj5 = Pa, (aft)/ = a(fiy). 

Note that we have adopted here the usual convention of writing aft in place of a- ft. 
Furthermore, multiplication distributes over addition; that is, for all a, /l, y e Z„, 
we have 

a(P + y) = ap + ay. 

All of these properties follow from the definitions, and the corresponding proper- 
ties for Z; for example, the fact that addition in Z„ is commutative may be seen as 
follows: if a = [a] and p = [A], then 

a + P = [a] + [ft] = [n + b] = [b + a] = [ b ] + [a] = P + a. 

Because addition and multiplication in Z„ arc associative, for a\ a& e Z„, 

we may write the sum a\ + • • • + a* and the product a i ■ ■ ■ a^ without any paren- 
theses, and there is no ambiguity; moreover, since both addition and multiplication 
are commutative, we may rearrange the terms in such sums and products without 
changing their values. 

The residue class [0] acts as an additive identity; that is, for all a e Z„, we have 
a + [0] = a\ indeed, if a = [a], then a + 0 = a (mod n). Moreover, [0] is the only 
element of Z„ that acts as an additive identity; indeed, if a + z = a (mod n ) holds 
for all integers a, then it holds in particular - for a = 0, which implies z = 0 (mod n). 
The residue class [0] also has the property that a ■ [0] = [0] for all a e Z„. 

Every a e Z„ has an additive inverse, that is, an element P e Z„ such that 
a + p = [0]; indeed, if a = [a], then clearly p := [-a] does the job, since 
a + (- a ) = 0 (mod n). Moreover, a has a unique additive inverse; indeed, if 
a + z = 0 (mod n), then subtracting a from both sides of this congruence yields 
Z = —a (mod n). We naturally denote the additive inverse of a by —a. Observe 
that the additive inverse of —a is a; that is —(—a) = a. Also, we have the identities 

-(a + P) = (-a) + (—/?), {—a)P = -(a/?) = a(~P), (- a)(-p ) = ap. 

For a, p e Z„, we naturally write a — p for a + (—/?). 

The residue class [1] acts as a multiplicative identity; that is, for all a e Z„, we 
have a - [1] = a; indeed, if a - [a], then a - 1 = a (mod n). Moreover, [1] is the only 
element of Z„ that acts as a multiplicative identity; indeed, if a-z = a (mod n ) holds 
for all integers a , then in particular, it holds for a = 1, which implies z = I (mod n). 
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For a e Z„, we call ft e Z„ a multiplicative inverse of a if a/? = [1], Not 
all a e Z„ have multiplicative inverses. If a = [a] and ft = [6], then ft is a 
multiplicative inverse of a if and only if ab = 1 (mod n). Theorem 2.5 implies that 
a has a multiplicative inverse if and only if gcd(a, n) = 1 , and that if it exists, it is 
unique. When it exists, we denote the multiplicative inverse of a by a~ l . 

We define Z* to be the set of elements of Z„ that have a multiplicative inverse. 
By the above discussion, we have 

Z* = {[a] : a = 0, n- 1, gcd(a, n) = 1}. 

If n is prime, then gcd(a, n) = 1 for a = 1, 1, and we see that Z* = Z„ \{[0]}. 
If « is composite, then Z* C Z„ \ {[0]} ; for example, if d | « with 1 < d < n, we 
see that [d] is not zero, nor does it belong to Z*. Observe that if a, ft e Z*, then so 
are a -1 and a/h indeed, 

(a -1 ) -1 = a and (or/?) -1 = a~ x ft~ l . 

For a e Z„ and ft £ Z*, we naturally write a //I for a/N 1 . 

Suppose a, ft, y arc elements of Z„ that satisfy the equation 

a/? = ay. 

If a e Z*, we may multiply both sides of this equation by a -1 to infer that 

ft = y. 

This is the cancellation law for Z„. We stress the requirement that a G Z*, and 
not just a ^ [0], Indeed, consider any a e Z„ \ Z*. Then we have a = [a] with 
d := gcd(a, n) > 1. Setting ft := [«/d] and y := [0], we see that 

aft = ay and ft y. 

Example 2.8. We list the elements of Z* l5 , and for each a e Z* l5 , we also give a -1 : 


a 

[1] 

[2] 

[4] 

[7] 

[8] 

[11] 

[13] 

[14] 

a~ l 

[1] 

[8] 

[4] 

[13] 

[2] 

[11] 

[7] 

[14] 


For ai ,... ,ak e Z„, we may naturally write their sum as Xf=i a ‘- conven- 

tion, this sum is [0] when k = 0. It is easy to see that - Xr=i a ‘ = 
that is, the additive inverse of the sum is the sum of the additive inverses. In the 
special case where all the aft s have the same value a, we define k ■ a := Xf=i 
thus, 0 • a = [0], 1 • a = a, 2-a = a + a, 3-a = a + a + a, and so on. The additive 
inverse of k ■ a is k ■ (-a), which we may also write as (— k) ■ a; thus, (-1) • a = -a, 
(-2) • a = (-a) + (-a) = -(a + a), and so on. Therefore, the notation k ■ a, or 
more simply, ka, is defined for all integers k. Note that for all integers k and a, we 
have k[a\ = [ka] = [/c][a]. 
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For all a, /? e Z„ and k,l eZ, we have the identities: 

k((a) = (kf)a = £(ka), ( k + l)a = ka + la, k{a + P) = ka + kp, 

(ka)P = k(a/k) = a(k/3). 

Analogously, for a\,...,ak £ Z„, we may write their product as nf= :!«/■ By 
convention, this product is [1] when k = 0. It is easy to see that if all of the «,’s 
belong to Z*, then so does their product, and in particular, 

that is, the multiplicative inverse of the product is the product of the multiplicative 
inverses. In the special case where all the a, ’s have the same value a, we define 
a k := Hf =1 a ; thus, a° = [1], a 1 = a, a 2 = aa, a 3 = aaa, and so on. If a e Z*, 
then the multiplicative inverse of a k is (a~ l ) k , which we may also write as a ~ k ; 
for example, a~ 2 = a _1 a _1 = (an) -1 . Therefore, when a e Z*, the notation a k is 
defined for all integers k. 

For all a, ft e Z„ and all non-negative integers k and l, we have the identities: 

(a e ) k = a kl = ( a k Y , a k+l = a k a l , ( ap) k = a k p k . (2.7) 

If a, P £ Z*, the identities in (2.7) hold for all k,l e Z. 

For all «i /1| , ... , Pi e Z„, the distributive property implies that 

(ai + • • • + ak)(Pi + ■ ■ ■ + Pi) = ^ aiPj. 

\<i<k 

l<j<l 

One last notational convention. As already mentioned, when the modulus n 
is clear from context, we usually write [a] instead of \a\ n . Although we want to 
maintain a clear distinction between integers and their residue classes, occasionally 
even the notation [a] is not only redundant, but distracting; in such situations, we 
may simply write a instead of [a]. For example, for every a e Z„, we have the 
identity (a + [l]„)(a - [1],,) = a 2 — [1]„, which we may write more simply as 
(a + [l])(a - [1]) = a 2 — [1], or even more simply, and hopefully more clearly, as 
(a + l)(a - 1) = a 2 — 1. Here, the only reasonable interpretation of the symbol “1” 
is [1], and so there can be no confusion. 

In summary, algebraic expressions involving residue classes may be manipulated 
in much the same way as expressions involving ordinary numbers. Extra compli- 
cations arise only because when n is composite, some non-zero elements of Z„ do 
not have multiplicative inverses, and the usual cancellation law does not apply for 
such elements. 

In general, one has a choice between working with congruences modulo n, or 
with the algebraic structure Z„; ultimately, the choice is one of taste and conven- 
ience, and it depends on what one prefers to treat as “first class objects”: integers 
and congruence relations, or elements of Z„. 
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An alternative, and somewhat more concrete, approach to constructing Z„ is to 
directly define it as the set of n “symbols” [0], [1 ],..., [n — 1], with addition and 
multiplication defined as 

[a] + [A] := [(a + b ) mod n], [a] • [ b] := [(a • b) mod n], 

for a, b e {0 n — 1 } . Such a definition is equivalent to the one we have given 

here. One should keep this alternative characterization of Z„ in mind; however, we 
prefer the characterization in terms of residue classes, as it is mathematically more 
elegant, and is usually more convenient to work with. 

We close this section with a reinterpretation of the Chinese remainder theorem 
(Theorem 2.6) in terms of residue classes. 

Theorem 2.8 (Chinese remainder map). Let {«;}f =1 be a pairwise relatively 
prime family of positive integers, and let n := JJ*_j Define the map 

6 : Z„ -> Z Ml x • • • x 7L„ k 
[n]„ i r ([c]„j, . . . , [a]„ t ). 

(i) The definition of 6 is unambiguous. 

(ii) 8 is bijective. 

(iii) For all a, fl e Z„, if 8 fa) = (ai, . . . , a*) and 8(P) = (/fi ,...,p k ), then: 

(a) 8(a + /?) = (ai +/?i,... , a k + p k ); 

(b) 8{—a) = (— ai ,...,-a k ); 

(c) dfafi) = (aiPu.--,a k p k ); 

(d) a e Z* if and only if a, e Z*. for / = 1 in which case 

8(a~ l ) = (a" 1 ,..., .a" 1 ). 

Proof. For (i), note that a = a' (mod n) implies a = a! (mod n,) for / = 1, . . . , k, 
and so the definition of 8 is unambiguous (it does not depend on the choice of n). 
(ii) follows directly from the statement of the Chinese remainder theorem. 

For (iii), let a = \a\ n and P = [b] n , so that for i = 1, . . . , k, we have a, = \a\ Hi 
and Pi = \b\„ r Then we have 

8{a + P) = 8{[a + b ] n ) = ([a + b] m , . . . , [a + b\„ k ) = («i + P\, . . . , a k + p k ), 
8(—a) = 6»([-a]„) = {[-a\ ni , . . . ,[-a] nk ) = f-a x , . . . ,-a k ), and 
8{aP) = 8{[ab] n ) = ([ab ] m ,. . . , [ab]„ k ) = fa x p x a k p k ). 



2.6 Euler’s phi function 


31 


That proves parts (a), (b), and (c). For paid (d), we have 

a e Z* 4=> gcd(n, n) = 1 

<=> gcd(n, rii) = 1 for i = 1, . . . , k 
<=> a i e Z*. for i = 

Moreover, if a e Z* and ft = a~ l , then 

(at A a*/fc) = 0(a/J) = 0([1]„) = ([1]„„ . . . , [I],*), 

and so for i = 1, . . . , k, we have a, If = [1]„., which is to say /?, = a ~ 1 . □ 

Theorem 2.8 is very powerful conceptually, and is an indispensable tool in many 
situations. It says that if we want to understand what happens when we add or 
multiply a, ft e Z„, it suffices to understand what happens when we add or multiply 
their “components” a,, fi, e Z„ ( . Typically, we choose n\, . . . , to be primes or 
prime powers, which usually simplifies the analysis. We shall see many applica- 
tions of this idea throughout the text. 

Exercise 2.19 . Let 6 : Z„ -> Z„, x • • • x 7L„ k be as in Theorem 2.8, and suppose 
that 9(a) = (a i, . . . , a*). Show that for every non-negative integer m, we have 
9(a m ) = (a ™, . . . , a™). Moreover, if a e Z*, show that this identity holds for all 
integers m. 

Exercise 2.20. Let p be an odd prime. Show that 2/?eZ* P l = Ih/sez* P = 

Exercise 2.21. Let p be an odd prime. Show that the numerator of Xf=i* l/ 1 ' i s 
divisible by p. 

Exercise 2.22. Suppose n is square-free (see Exercise 1.15), and let a, e Z„. 
Show that a 2 f = a 2 y implies af = ay. 


2.6 Euler’s phi function 

Euler’s phi function (also called Euler’s totient function) is defined for all posi- 
tive integers n as 

tp(n) := |Z*|. 

Equivalently, cp(n) is equal to the number of integers between 0 and n — I that arc 
relatively prime to n. For example, cp( 1) = 1, cp( 2) = 1, ^>(3) = 2, and cp( 4) = 2. 

Using the Chinese remainder theorem, more specifically Theorem 2.8, it is easy 
to get a nice formula for cp(n) in terms of the prime factorization of n, as we estab- 
lish in the following sequence of theorems. 
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Theorem 2.9. Let {«/}f =1 be a pairwise relatively prime family of positive inte- 
gers, and let n := Jlf =1 n i- Then 

k 

(pin) = Y[ (Pi n i)- 

i = 1 

Proof. Consider the map 0 : Z„ -> Z Hl x • • • x Z Hj , in Theorem 2.8. By parts (ii) 
and (iii.d) of that theorem, restricting 9 to Z* yields a one-to-one correspondence 
between Z* and Z* f x • • • x Z*, . The theorem now follows immediately. □ 

We already know that cp( p) = p — 1 for every prime p, since the integers 
1 , .... p — 1 are not divisible by p, and hence are relatively prime to p. The next 
theorem generalizes this, giving us a formula for Euler’s phi function at prime 
powers. 

Theorem 2.10. Let p be a prime and e be a positive integer. Then 

<p(p e ) = P e ~\p~ !)• 

Proof. The multiples of p among 0. I — 1 arc 

0 • p, 1 ■ p, . . ■ , ip e ~ X - 1 )■ P, 

of which there are precisely p e ~ x . Thus, cp{p e ) = p e — p e ~ l = p e ~ l {p — 1). □ 

If n = /jj 1 • • • pf is the factorization of n into primes, then the family of prime 
powers {/^'}' =] is pairwise relatively prime, and so Theorem 2.9 implies (pin) = 
(p(p\ l ) • • • (PiPr)- Combining this with Theorem 2.10, we have: 

Theorem 2.11. If n = p e { ' ■ ■ ■ pf is the factorization of n into primes, then 

r r 

(Pin) = \\pT X iPi “ !) = "II^ 1 - Va)- 

(=1 ;= l 

Exercise 2.23. Show that cp{nm) = gcd(n, m) ■ cp(lcm(n, m)). 

Exercise 2.24. Show that if n is divisible by r distinct odd primes, then 2 r \ cp{n). 

Exercise 2.25. Define cp 2 in) to be the number of integers a e {0,...,« — 1} such 
that gcdfa, n) = gcdfn + 1, n) = 1. Show that if n = p e { ' ■ ■ ■ p e / is the factorization 
of n into primes, then (pfn) = «n;=i(i-2/A). 


2.7 Euler’s theorem and Fermat’s little theorem 

Let n be a positive integer, and let a e Z*. 
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Consider the sequence of powers of a: 

1 = a 0 , a 1 , a 2 , 

Since each such power is an element of Z*, and since Z* is a finite set, this sequence 
of powers must start to repeat at some point; that is, there must be a positive integer 
k such that a k = a' for some i = 0, . . . , k — 1 . Let us assume that k is chosen to be 
the smallest such positive integer. This value k is called the multiplicative order 
of a. 

We claim that a k = 1. To see this, suppose by way of contradiction that a k = a 1 , 
for some i = 1 , . . . , k - 1 ; we could then cancel a from both sides of the equation 
a k = a ', obtaining a k ~ l = a’~ l , which would contradict the minimality of k. 

Thus, we can characterize the multiplicative order of a as the smallest positive 
integer k such that 

a k = 1. 

If a = [n] with a e Z (and gcd {a,n) = 1, since a e Z*), then k is also called 
the multiplicative order of a modulo n, and can be char acterized as the smallest 
positive integer k such that 

a k = 1 (mod n). 

From the above discussion, we see that the first k powers of a, that is, a 0 , a 1 , 

. . . , a k ~ l , are distinct. Moreover, other powers of a simply repeat this pattern. The 
following is an immediate consequence of this observation. 

Theorem 2.12. Let n be a positive integer, and let a be an element of Z* of 
multiplicative order k. Then for every i e Z, we have a' = 1 if and only if 
k divides i. More generally, for all i,j e Z, we have a 1 = a j if and only if 
i = j (mod k). 

Example 2.9. Let n = 7. For each value a = 1, . . . , 6, we can compute successive 
powers of a modulo n to find its multiplicative order modulo n. 


i 

1 

2 

3 

4 

5 

6 

V 

mod 7 

1 

1 

1 

1 

1 

1 

2‘ 

mod 7 

2 

4 

1 

2 

4 

1 

3' 

mod 7 

3 

2 

6 

4 

5 

1 

4‘ 

mod 7 

4 

2 

1 

4 

2 

1 

5' 

mod 7 

5 

4 

6 

2 

3 

1 

6' 

mod 7 

6 

1 

6 

1 

6 

1 


So we conclude that modulo 7: 1 has order 1; 6 has order 2; 2 and 4 have order 3; 
and 3 and 5 have order 6. □ 
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Theorem 2.13 (Euler’s theorem). Let n be a positive integer and a e Z*. Then 
a q>(n) _ j [ n particular, the multiplicative order of a divides (pin). 

Proof. Since a e Z*, for every /? e Z* we have a/? e Z*, and so we may define the 
“multiplication by a” map 

*«: K^K 

P i-»- ap. 

It is easy to see that r a is a bijection: 

Injectivity: If af = ap', then cancel a to obtain p = /?' . 

Surjectivity: For every y e Z*, a~ l y is a pre-image of y under r„. 

Thus, as p ranges over the set Z*, so does ap, and we have 

n/=n ^ = a<pM ( n /) ■ 

P^^n /?GZh 

Canceling the common factor P e Z* from the left- and right-hand side of 

(2.8), we obtain 

1 = a v(n) . 

That proves the first statement of the theorem. The second follows immediately 
from Theorem 2.12. □ 

As a consequence of this, we obtain: 

Theorem 2.14 (Fermat’s little theorem). For every prime p, and every a e Z p , 
we have a p = a. 

Proof. If a = 0, the statement is obviously true. Otherwise, a e Z*, and by 
Theorem 2. 13 we have a 1 ’- 1 = 1. Multiplying this equation by a yields a p = a. □ 

In the language of congruences, Fermat’s little theorem says that for every prime 
p and every integer a, we have 

a p = a (mod p). 

For a given positive integer n, we say that a e Z with gcd(a, n) = 1 is a primitive 
root modulo n if the multiplicative order of a modulo n is equal to (pin). If this is 
the case, then for a := [a\ e Z*, the powers a 1 range over all elements of Z* as 

i ranges over the interval 0 (pin) - 1. Not all positive integers have primitive 

roots — we will see in §7.5 that the only positive integers n for which there exists a 
primitive root modulo n are 

n= 1,2, 4,/, 2/, 
where p is an odd prime and e is a positive integer. 
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The following theorem is sometimes useful in determining the multiplicative 
order of an element in Z*. 

Theorem 2.15. Suppose a e Z* has multiplicative order k. Then for every m e Z, 
the multiplicative order of a' n is k / gcd(m, k). 

Proof. Applying Theorem 2.12 to a m , we see that the multiplicative order of a m is 
the smallest positive integer £ such that a ml = 1. But we have 

a m( = 1 4=> ml = 0 (mod k) (applying Theorem 2. 12 to a) 

£ = 0 (mod k/ gcd(m, k)) (by paid (ii) of Theorem 2.5). □ 

Exercise 2.26. Find all elements of Z^ g of multiplicative order 18. 

Exercise 2.27. Let n e Z with n > 1. Show that n is prime if and only if 
a" -1 = 1 for every non- zero a e Z„. 

Exercise 2.28. Let n = pq, where p and q arc distinct primes. Show that if 
m := lcm(p — 1, q - 1), then a m = 1 for all a e Z*. 

Exercise 2.29. Let p be any prime other than 2 or 5. Show that p divides 
infinitely many of the numbers 9, 99, 999, etc. 

Exercise 2.30. Let n be an integer greater than 1. Show that n does not divide 
2 " - 1 . 

Exercise 2.31. Prove the following generalization of Fermat’s little theorem: for 
every positive integer n , and every a e Z„, we have a” = a n ~ rpin) . 

Exercise 2.32. This exercise develops an alternative proof of Fermat’s little the- 
orem. 

(a) Using Exercise 1.14, show that for all primes p and integers a , we have 
(i a + \) p = a p + 1 (mod p). 

(b) Now derive Fermat's little theorem from part (a). 


2.8 Quadratic residues 

In §2.3, we studied linear congruences. It is natural to study congruences of higher 
degree as well. In this section, we study a special case of this more general prob- 
lem, namely, congruences of the form z 2 = a (mod n). The theory we develop here 
nicely illustrates many of the ideas we have discussed earlier, and has a number of 
interesting applications as well. 
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We begin with some general, preliminary definitions and general observations 
about powers in Z*. For each integer m, we define 

(KT ■= {p m :peZ* n }, 

the set of /nth powers in Z*. The set (Z*) m is non-empty, as it obviously contains 

[ 1 ]. 

Theorem 2.16. Let n be a positive integer, let a, ft e Z*, and let m be any integer. 

(i) If a 6 (Z*)"\ then cT 1 e (Z*) m . 

(ii) If a e (Z*) m and p e (Z*) m , then a/? e (Z*)"'. 

(iii) If a e (Z*)"' and p $ (Z*) m , then a/? £ (Z*)"'. 

Proof. For (i), if a = y m , then a -1 = (y -1 ) m . 

For (ii), if a = y' n and /i = S m , then af = (yS)'". 

For (iii), suppose that a e (Z*) m , /? ^ (Z*) m , and af e (Z*) m . Then by (i), 
a -1 e (Z*) m , and by (ii), /i = a _1 (a^) e (Z*) m , a contradiction. □ 

Theorem 2.17. Let « be a positive integer. For each a e Z*, and all (, m e Z with 
gcd(f, /n) = 1, if a e e (Z*) m , then a e (Z*) m . 

Proof. Suppose = /?"' e (Z*) m . Since gcd((, m) = 1, there exist integers 5 and t 
such that Is + mt = 1. We then have 

a = = c/V" r = = (/?V) m e (Z*) m . □ 

We now focus on the squares in Z*, rather than general powers. An integer a 
is called a quadratic residue modulo n if gcd(n, n) = 1 and a = b 2 (mod n) for 
some integer b: in this case, we say that b is a square root of a modulo n. In terms 
of residue classes, a is a quadratic residue modulo n if and only if [a] e (Z*) 2 . 

To avoid some annoying technicalities, from now on, we shall consider only the 
case where n is odd. 


2.8.1 Quadratic residues modulo p 

We first study quadratic residues modulo an odd prime p, and we begin by deter- 
mining the square roots of 1 modulo p. 

Theorem 2.18. Let p be an odd prime and ft 6 Tip. Then f 2 = 1 if and only if 
P = ±\. 

Proof. Clearly, it /l = ±1, then f 2 = 1. Conversely, suppose that f 2 = 1. Write 
P = \b\, where be Z. Then we have b 2 = 1 (mod p), which means that 

/z|(b 2 -l) = (b-l)(b+l), 
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and since p is prime, we must have p \ (b - 1) or p \ (b + 1). This implies 
b = ±1 (mod p), or equivalently, P = ± 1. □ 

This theorem says that modulo p. the only square roots of 1 are 1 and -1, which 
obviously belong to distinct residue classes (since p > 2). From this seemingly 
trivial fact, a number of quite interesting and useful results may be derived. 

Theorem 2.19. Let p be an odd prime and y, P e Z*. Then y 2 = p 2 if and only if 
y = ±p. 

Proof. This follows from the previous theorem: 

y 2 = P~ <=> ( y/P ) 2 = 1 <=> y/P = ±1 <=> y = ±p. □ 

This theorem says that if a = p 2 for some P e Z*, then a has precisely two 
square roots: /? and -/?. 

Theorem 2.20. Let p be an odd prime. Then |(Z*) 2 | = (p — l)/2. 

Proof. By the previous theorem, the “squaring map” a : Z* -» Z* that sends p 
to p 2 is a two-to-one map: every element in the image of a has precisely two pre- 
images. As a general principle, if we have a function / : A -> B. where A is a 
finite set and every element in /(A) has exactly d pre-images, then |/(A)| = \A\/d. 
Applying this general principle to our setting, we see that the image of a is half the 
size of Z*. □ 

Thus, for every odd prime p , exactly half the elements of Z* arc squares, and half 
arc non-squares. If we choose our representatives for the residue classes modulo p 
from the interval [—p/2, p/2), we may list the elements of Z p as 

[~(P ~ l)/2], . . . , [-1], [0], [1], l)/2]. 

We then see that Z* consists of the residue classes 

[±1] [±(P - l)/2], 

and so (Z*) 2 consists of the residue classes 

[l] 2 ,...,[(p-l)/2] 2 , 

which must be distinct, since we know that |(Z*) 2 | = (p - l)/2. 

Example 2.10. Let p = 7 . We can list the elements of Z* as 

[±1],[±2],[±3], 

Squaring these, we see that 

(Zp 2 = { [l] 2 , [2] 2 , [3] 2 } = {[1], [4], [2]}. □ 
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We next derive an extremely important characterization of quadratic residues. 


Theorem 2.21 (Euler’s criterion). Let p be an odd prime and a e Z*. 

(i) a (p ~ l) / 2 = ±1. 

(ii) If a e (Z*) 2 then a (p_1) /2 = 1. 

(Hi) If a £ (Zp 2 then a ip ~ r) / 2 = -1. 

Proof. For (i), let y = a b>-i)/2. By Euler’s theorem (Theorem 2.13), we have 

y 2 = a p ~ l = 1 , 

and hence by Theorem 2.18, we have y = ±1. 

For (ii), suppose that a = ft 2 . Then again by Euler’s theorem, we have 
«T-D/2 = (/ j2 )(p -1)/2 = pr-l = L 

For (iii), let a e Z* \ (Zp 2 . We study the product 

£ := n f- 

We shall show that, on the one hand, e = a (p ~ x ^ 2 , while on the other hand, e = - 1. 

To show that e = a (/,-1) / 2 , we group elements of Z* into pairs of distinct ele- 
ments whose product is a. More precisely, let P := { S C Z* : | S \ = 2}, and 
define C := { [k, A} e P : kA = a} . Note that for every k e Z*, there is a unique 
A e Z* such that = a, namely, A := <x/ k\ moreover, k f A. since otherwise, 
we would have k 2 = a, contradicting the assumption that a £ (Zp 2 . Thus, every 
element of Z* belongs to exactly one pair in C; in other words, the elements of C 
form a partition of Z*. It follows that 

£ = PJ (k ■ A) = PI a = a (p - l) / 2 . 

{k",A}gC {k,A} gC 

To show that t = - 1 , we group elements of Z* into pairs of distinct elements 
whose product is [1]. Define D := [ [k. A] e P : kA = 1}. For every k e Z*, 
there exists a unique 1 6 Z* such that k A = 1, namely, A := *r , moreover, k = A 
if and only if k 2 = 1, and by Theorem 2.18, this happens if and only if k = ±1. 
Thus, every element of Z* except for [±1] belongs to exactly one pair in D; in 
other words, the elements of D form a partition of Z* \ { [± 1] } . It follows that 

£ = [1] •[-!]• PP (*: ■ A) = [-1] ■ PP [1] = -1- □ 

{k,A}€D {ic,A}eD 



2.8 Quadratic residues 


39 


Thus, Euler’s criterion says that for every a e Z *, we have a ,p l> / 2 = ±1 and 
a e (Zp 2 <=> a (p-D/2 _ L 

In the course of proving Euler’s criterion, we proved the following result, which 
we state here for completeness: 

Theorem 2.22 (Wilson’s theorem). Let p be an odd prime. Then /? = — I ■ 

In the language of congruences, Wilson’s theorem may be stated as follows: 

(j>- 1)! = -1 (mod p ). 

We also derive the following simple consequence of Theorem 2.21: 

Theorem 2.23. Let p be an odd prime and a, (3 e Zp If a $ (Zp 2 and ft ^ (Zp 2 , 
then a/? e (Zp 2 . 

Proof. Suppose a ^ (Zp 2 and ft £ (Zp 2 . Then by Euler’s criterion, we have 
a (P-n/2 = _! and p tP- D/2 = _ L 

Therefore, 

(aft) (p ~ l)/2 = a (p ~ l)/2 ■ /?T-D/2 _ [_i] . [_l] = i, 
which again by Euler’s criterion implies that aft e (Zp 2 . □ 

This theorem, together with parts (ii) and (iii) of Theorem 2.16, gives us the 
following simple rules regarding squares in Z*: 

square x square = square, 

square x non-square = non-square, 

non-square x non-square = square. 


2.8.2 Quadratic residues modulo p e 

We next study quadratic residues modulo p e , where p is an odd prime. The key is 
to establish the analog of Theorem 2. 18: 

Theorem 2.24. Let p be an odd prime, e be a positive integer, and ft e Z /; <- . Then 
ft 2 = 1 if and only if ft = ±1. 

Proof. Clearly, if ft = ±1, then ft 2 = 1. Conversely, suppose that ft 2 = 1. Write 
ft = \h\, where be Z. Then we have b 2 = 1 (mod p e ), which means that 

P e \{b 2 -i) = (b-m+i). 
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In particular, p | (b - 1 ){b + 1), and so p \ (b - 1) or p \ (b + 1). Moreover, p 
cannot divide both b — 1 and b + 1 , as otherwise, it would divide their difference 
(b + 1) - (b — 1) = 2, which is impossible (because p is odd). It follows that 
p e | (b - 1) or p e | (b + 1), which means f = ±1. □ 

Theorems 2.19-2.23 generalize immediately from Z* to Zp we really used 
nothing in the proofs of these theorems other than the fact that ±1 are the only 
square roots of 1 modulo p. As such, we state the analogs of these theorems for 
Z* c without proof. 

Theorem 2.25. Let p be an odd prime, e be a positive integer, and y, (1 e Z* ( , . 
Then y 2 = ft 2 if and only if y = ±jl. 

Theorem 2.26. Let p be an odd prime and e be a positive integer. Then we have 
|(Zp 2 | = <p(p e )/ 2. 

Theorem 2.27. Let p be an odd prime, e be a positive integer, and a e Zp 

(i) a ^P e) / 2 = ±1. 

(ii) If a e (Zp 2 then a <p(pC) / 2 = 1. 

(in) If a i (Z* c ) 2 then a^/ 2 = -1. 

Theorem 2.28. Let p be an odd prime and e be a positive integer. Then we have 

IW; e P = ~l- 

Theorem 2.29. Let p be an odd prime, e be a positive integer, and a, /i e Zp If 
a i (Zp 2 and (I $ (Z* pe ) 2 , then ap e (Zp 2 . 

It turns out that an integer is a quadratic residue modulo p e if and only if it is a 
quadratic residue modulo p. 

Theorem 2.30. Let p be an odd prime, e be a positive integer, and a be any integer. 
Then a is a quadratic residue modulo p e if and only if a is a quadratic residue 
modulo p. 

Proof. Suppose that a is a quadratic residue modulo p e . Then a is not divisible by 
p and a = b 2 (mod p e ) for some integer b. It follows that a = b 2 (mod p), and so a 
is a quadratic residue modulo p. 

Suppose that a is not a quadratic residue modulo p e . If a is divisible by p, then 
by definition a is not a quadratic residue modulo p. So suppose a is not divisible 
by p. By Theorem 2.27, we have 

a f-\ P - 1)/2 = _ , (mod p ey 

This congruence holds modulo p as well, and by Fermat’s little theorem (applied 
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e - 1 times), 

a = a p = a p = • • • = a p (mod p ), 

and so 

— 1 = a p ‘~ l(p ~^/ 2 = a (p_1) / 2 (mod p). 

Theorem 2.21 therefore implies that a is not a quadratic residue modulo p. □ 


2.8.3 Quadratic residues modulo n 

We now study quadratic residues modulo n, where n is an arbitrary, odd integer, 
with n > 1 . Let 

ei e r 

n = p { ■ ■ ■ Pr 

be the prime factorization of n. Our main tools here arc the Chinese remainder map 

$ : Z„ — » Zyi x • • • x Z p*r , 

introduced in Theorem 2.8, together with the results developed so far for quadratic 
residues modulo odd prime powers. 

Let a e Z* with 9(a) = (aq, . . . , ay)- 

• On the one hand, suppose a = /? 2 for some P e z*. If 9(p) = Uh Pr), 

we have 

0 cn,...,a r ) = 9(a) = 9(p 2 ) = (p 2 l ,...,p 2 r ), 

'S 

where we have used paid (iii.c) of Theorem 2.8. It follows that a, = ft- for 
each /'. 

• On the other hand, suppose that for each i, a, = /L for some /?, e Z * ej . 

Then setting f3 := 9~ l (p\ j3 r ), we have 

0(P 2 ) = (p\, • • • , Pr) = («t « r ) = 0(at), 

where we have again used paid (iii.c) of Theorem 2.8, along with the fact 
that 6 is bijective (to define /?). Thus, 9(a) = 9(fi 2 ), and again since 9 is 
bijective, it follows that a = fi 2 . 

We have shown that 

a e (Z *) 2 a t e (2y, ) 2 for / = 1, r. 

In particular, restricting 9 to (Z *) 2 yields a one-to-one coiTespondence between 
(Z *) 2 and 

(z*,,) 2 x--.x(z; r „) 2 , 
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and therefore, by Theorem 2.26 (and Theorem 2.9), we have 

r 

|(z:) 2 | = )/2) = ^n)/2 r . 

(=i 

Now suppose that a = p 2 . with p e Z* and 6{P) = (/?i , . . . , p r ). Consider an 
arbitrary element y e Z*, with 0(y) = (yj y,.). Then we have 

r = f <=> 0(y 2 ) = 8{p 2 ) 

<=> (y 2 ,...,y 2 ) = (P 2 fi) 

(yi , • • • , Yr) = (±/?i ±P r ) (by Theorem 2.25). 

Therefore, a has precisely 2 r square roots, namely, 8~ l (±p i, . . . , ±p r ). 


2.8.4 Square roots of -l modulo p 

Using Euler’s criterion, we can easily characterize those primes modulo which -1 
is a quadratic residue. This turns out to have a number of nice applications. 

Consider an odd prime p. The following theorem says that the question of 
whether -1 is a quadratic residue modulo p is decided by the residue class of p 
modulo 4. Since p is odd, either p = 1 (mod 4) or p = 3 (mod 4). 

Theorem 2.31. Let p be an odd prime. Then -1 is a quadratic residue modulo p 
if and only p = I (mod 4). 

Proof. By Euler’s criterion, -1 is a quadratic residue modulo p if and only if 
(_l)0-i)/ 2 = i (mod p). If p = 1 (mod 4), then ( p — l)/2 is even, and so 
(_l)(p- ! )/ 2 = i if p = 3 ( mo d 4), then (p-l)/2 is odd, and so (-1 ) ( p~ v >/ 2 = -1. □ 

In fact, when p = 1 (mod 4), any non-square in Z* yields a square root of -1 
modulo p, as follows: 

Theorem 2.32. Let p be a prime with p = 1 (mod 4), ye Z* \ (Z*) 2 , and 

f) : = yCp— i )/ 4 Then p 2 - _i 

Proof. This is a simple calculation, based on Euler’s criterion: 

P 2 = y ( P - 1)/2 = _ L n 

The fact that - 1 is a quadratic residue modulo primes p = I (mod 4) can be 
used to prove Fermat’s theorem that such primes may be written as the sum of two 
squares. To do this, we first need the following technical lemma: 



2.8 Quadratic residues 43 

Theorem 2.33 (Thue’s lemma). Let n,b,r*,t* e Z, with 0 < r* < n < r*t*. 
Then there exist r,t e Z with 

r = bt (mod n), \r\ < r* , and 0 < |f| < t*. 

Proof. For i = 0, . . . , r* — 1 and j = 0, . . . , t* — 1 , we define the number v (/ - := i—bj. 
Since we have defined r*t* numbers, and r* f > n, two of these numbers must lie 
in the same residue class modulo tv, that is, for some (/ 1 , Ji ) f (L, ji), we have 
v il j l = Vj 2 j 0 (mod n). Setting r := i\—h and t := ji—ji, this implies r = bt (mod n), 
\r\ < r* , |f| < t*, and that either r f 0 or t f 0. It only remains to show that t f 0. 
Suppose to the contrary that t = 0. This would imply that r = 0 (mod n) and r f 0. 
which is to say that r is a non-zero multiple of n: however, this is impossible, since 
|r | < r* < n. □ 

Theorem 2.34 (Fermat’s two squares theorem). Let p be an odd prime. Then 
p = r 2 + t 2 for some r,t e Z if and only if p = 1 (mod 4). 

Proof. One direction is easy. Suppose p = 3 (mod 4). It is easy to see that the 
square of every integer is congruent to either 0 or 1 modulo 4; therefore, the sum of 
two squares is congruent to either 0, 1, or 2 modulo 4, and so can not be congruent 
to p modulo 4 (let alone equal to p). 

For the other direction, suppose p = I (mod 4). We know that -1 is a quadratic 
residue modulo p , so let b be an integer such that Zr = — I (mod p). Now apply 
Theorem 2.33 with n := p, b as just defined, and r* := t* := l*fp\ + 1. Evidently, 
IVP\ + 1 > VP- and hence r*t* > p. Also, since p is prime, fp is not an integer, 
and so \_\fp\ < \[p < p\ in particular, r* = l*fp\ + 1 < p. Thus, the hypotheses of 
that theorem are satisfied, and therefore, there exist integers r and t such that 

r = bt (mod p), \r\ < [s/p\ < -Jp, and 0 < |f| < [VpJ < sfp- 

It follows that 

r 1 = b 2 t 2 = - 1 2 (mod p). 

Thus, r 2 + t 2 is a multiple of p and 0 < r 2 + t 2 < 2p. The only possibility is that 
r 2 + t 2 = p. □ 

The fact that -1 is a quadratic residue modulo an odd prime p only if p = 
1 (mod 4) can be used so show there are infinitely many such primes. 

Theorem 2.35. There are infinitely many primes p = 1 (mod 4). 

Proof. Suppose there were only finitely many such primes, p\, .... p/ c . Set M := 
]^ =] p, and N := AM 2 + 1. Let p be any prime dividing N. Evidently, p 
is not among the p,’s, since if it were, it would divide both N and AM 2 , and 
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so also N - 4 M 2 = 1. Also, p is clearly odd, since N is odd. Moreover, 
(2 M) 2 = -1 (mod p): therefore, -1 is a quadratic residue modulo p, and so 
p = I (mod 4), contradicting the assumption that p\,...,pk arc the only such 
primes. □ 

For completeness, we also state the following fact: 

Theorem 2.36. There are infinitely many primes p = 3 (mod 4). 

Proof. Suppose there were only finitely many such primes, p\,.. . ,Pk- Set M := 
Q| c =1 pi and N := 4 M — 1. Since N = 3 (mod 4), there must be some prime 
p = 3 (mod 4) dividing N (if all primes dividing N were congruent to 1 modulo 4, 
then so too would be their product N). Evidently, p is not among the p, ’s, since if 
it were, it would divide both N and 4 M, and so also 4 M — N = 1 . This contradicts 
the assumption that p\, .... pt are the only primes congruent to 3 modulo 4. □ 

Exercise 2.33. Let n,m e Z, where n > 0, and let d := gcd {m.cp{n)). Show 
that: 

(a) ifd= l,then(Z:r = (Z*); 

(b) if a e (Z*) m , then a (p(n) ! d = 1. 

Exercise 2.34. Calculate the sets C and D in the proof of Theorem 2.21 in the 
case p = 11 and a = — 1 . 

Exercise 2.35. Calculate the square roots of 1 modulo 4, 8, and 16. 

Exercise 2.36. Let n e Z with n > 1. Show that n is prime if and only if 
(n — 1)! = — 1 (mod n). 

Exercise 2.37. Let p be a prime with p = 1 (mod 4), and b := ((/? - l)/2)!. 
Show that b 2 = -1 (mod p). 

Exercise 2.38. Let n := pq , where p and q are distinct, odd primes. Show that 
there exist a, f e Z* such that a £ (Z*) 2 , /? ^ (Z*) 2 , and af £ (Z*) 2 . 

Exercise 2.39. Let n be an odd positive integer, and let a be any integer. Show 
that a is a quadratic residue modulo n if and only if a is a quadratic residue modulo 
p for each prime p \ n. 

Exercise 2.40. Show that if p is an odd prime, with p = 3 (mod 4), then 
(Z*) 4 = (Zp 2 . More generally, show that if n is an odd positive integer, where 
p = 3 (mod 4) for each prime p \ n. then (Z*) 4 = (Zp 2 . 

Exercise 2.41. Let p be an odd prime, and let e e Z with e > 1. Let a be an 
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integer of the form a = p^b, where 0 < f < e and p \ b. Consider the integer 
solutions z to the congruence z 2 = a (mod p e ). Show that a solution exists if and 
only if / is even and b is a quadratic residue modulo p, in which case there arc 
exactly 2// distinct solutions modulo p e . 

Exercise 2.42. Suppose p is an odd prime, and that r ’ + r = p for some integers 
r, t. Show that if x, y are integers such that x 2 +y 2 = p, then (x, y) must be (±r, ±t) 
or (±t, ±r). 

Exercise 2.43. Show that if both u and v are the sum of two squares of integers, 
then so is their product uv. 

Exercise 2.44. Suppose r 2 + t 2 = 0 (mod n ), where n is a positive integer, and 
suppose p is an odd prime dividing n. Show that: 

(a) if p divides neither r nor t, then p = I (mod 4); 

(b) if p divides one of r or t, then it divides the other, and moreover, p 2 divides 
n, and (, r/p ) 2 + ( t/p ) 2 = 0 (mod n/p 2 ). 

Exercise 2.45. Let n be a positive integer, and write n = ab 2 where a and b are 
positive integers, and a is square-free (see Exercise 1.15). Show that n is the sum 
of two squares of integers if and only if no prime p = 3 (mod 4) divides a. Hint: 
use the previous two exercises. 


2.9 Summations over divisors 

We close this chapter with a brief treatment of summations over divisors. To this 
end, we introduce some terminology and notation. By an arithmetic function, 
we simply mean a function from the positive integers into the reals (actually, one 
usually considers complex-valued functions as well, but we shall not do so here). 
Let / and g be arithmetic functions. The Dirichlet product of / and g, denoted 
/ * g, is the arithmetic function whose value at n is defined by the formula 

(/ * g)(n) ■= ^ f(d)g(n/d), 

d\n 

the sum being over all positive divisors d of n. Another, more symmetric, way to 
write this is 

(f*g)(n)= ^ f(d\)g(d 2 ), 

n=d\d2 

the sum being over all pairs {d\,d 2 ) of positive integers with d\d 2 = n. 
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The Dirichlet product is clearly commutative (i.e., / * g = g * /), and is asso- 
ciative as well, which one can see by checking that 

(/ * (g * h))(n) = ^ f(di)g(d 2 )h(d 3 ) = ((f * g) * h)(ri), 

n=dtd 2 d } 


the sum being over all triples (d\, d 3 , d 3 ) of positive integers with d\did 3 = n. 

We now introduce three special arithmetic functions: 1, 1, and p. The functions 
I and 1 arc defined as follows: 

'< n): = L 

The Mobius function p is defined as follows: if n = p\ l ■ ■ ■ pf is the prime factor- 
ization of n, then 


li{n) 


C 0 if e, > 1 for some i = 1, . . . , r; 
\ (-l) r otherwise. 


In other words, pin) = 0 if n is not square-free (see Exercise 1.15); otherwise, 
p{n) is (— iy where r is the number of distinct primes dividing n. Here arc some 
examples: 


q(l) = 1, p(2) = -1, p(3) = -1, p{4) = 0, p(5) = -1, p{6) = 1. 


It is easy to see from the definitions that for every arithmetic function /, we have 
/*/ = / and (i */)(«) = 2 /(<*)• 

d\n 

Thus, I acts as a multiplicative identity with respect to the Dirichlet product, while 
“7 * ” acts as a “summation over divisors” operator. 

An arithmetic function / is called multiplicative if /( 1) = 1 and for all positive 
integers n, m with gcd(«, m) = 1, we have / (nm) = f(n)f ( m ). 

The reader may easily verify that 1, 1, and p are multiplicative functions. Theo- 
rem 2.9 says that Euler’s function cp is multiplicative. The reader may also verify 
the following: 

Theorem 2.37. If f is a multiplicative arithmetic function, and if n = p\' ■ ■ ■ p e / 
is the prime factorization of n, then f{n) = /(Pj 1 ) • • • /( p c r r ). 

Proof. Exercise. □ 

A key property of the Mobius function is the following: 
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Theorem 2.38. Let f be a multiplicative arithmetic function. If n = p e } ' ■ ■ ■ p e f is 
the prime factorization of n , then 

2 Hid) fid) = (1 - f(p x )) ■ ■ ■ (1 - f(p r )). (2.9) 

d\n 

Proof. The only non-zero terms appealing in the sum on the left-hand side of (2.9) 
are those corresponding to divisors d of the form ■ • • p ir where p tl , . . . , p if arc 
distinct; the value contributed to the sum by such a term is (— l//(p (1 • • • p it ) = 
(— 1 ) l f(pi i ) ■ ■ ■ f{pi t )■ These are the same as the terms in the expansion of the 
product on the right-hand side of (2.9). □ 


If we set / := 1 in the previous theorem, then we see that 


^ H(d) 

d\n 


1 if n = 1; 
0 if n > 1. 


Translating this into the language of Dirichlet products, we have 


1 * p = I. 


Thus, with respect to the Dirichlet product, the functions 1 and p are multiplicative 
inverses of one another. Based on this, we may easily derive the following: 


Theorem 2.39 (Mobius inversion formula). Let f and F be arithmetic functions. 
Then F = 1 * / if and only if f = p* F . 


Proof. If F = 1 * /, then 

p* F = p*(l * f) = (p* 1)* f = I * f = f, 
and conversely, if / = p* F, then 

l*f = l + {p-kF) = {l*p)*F = I*F = F. □ 

The Mobius inversion formula says this: 

F{n) = ^ f(d) for all positive integers n 

d\n 

<=> f{n) = ^ p(d)F(n/d) for all positive integers n. 

d\n 

The Mobius inversion formula is a useful tool. As an application, we use it to 
obtain a simple proof of the following fact: 

Theorem 2.40. For every positive integer n, we have Yjd\n ( P(d) = n. 
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Proof. Let us define the arithmetic functions N(n) := n and M(n) := I /n. Our 
goal is to show that N = 1 * cp, and by Mobius inversion, it suffices to show that 
p ★ N = cp. If n = /q 1 • • • p e f is the prime factorization of n, we have 

(d * N)(n) = ^ p(d)(n/d) = n ^ p(d)/d 

d\n d\n 

r 

= n J^J(1 - 1 / Pi) (applying Theorem 2.38 with / := M ) 

/= t 

= cp{ri) (by Theorem 2.1 1). □ 

Exercise 2.46. In our definition of a multiplicative function /, we made the 
requirement that /( 1) = 1. Show that if we dropped this requirement, the only 
other function that would satisfy the definition would be the zero function (i.e., the 
function that is everywhere zero). 

Exercise 2.47. Let / be a polynomial with integer coefficients, and for each 

positive integer n, define a>f(n) to be the number of integers x e {0 ,n — 1} 

such that f(x ) = 0 (mod n). Show that cop is multiplicative. 

Exercise 2.48. Show that if / and g are multiplicative, then so is / * g. Hint: 
use Exercise 1.18. 

Exercise 2.49. Let r{n) be the number of positive divisors of n. Show that: 

(a) r is a multiplicative function; 

(b) r{n) = n; = jfe, + 1), where n = p e { ' ■ ■ ■ pf is the prime factorization of n; 

( c ) JjdindidMn/d) = 1; 

(d) Y*d\n t-dd)r(d) = (-l) r , where n = p e { ' ■ ■ ■ pf is the prime factorization of n. 

Exercise 2.50. Define cr{n) := d. Show that: 

(a) o- is a multiplicative function; 

(b) (j(n) = n;=i + l - 1 )/(Pi - 1), where n = p\' ■ ■ ■ pf is the prime factor- 
ization of n; 

( c ) 'Ed\nd(.d)c(n/d) = n\ 

(d) Yi d \ n >dd)o(d) = (- 1 ) r p\ ■ ■ ■ p r , where n = p e { 1 ■ ■ ■ pf is the prime factor- 
ization of n. 

Exercise 2.51. The Mangoldt function A (n) is defined for all positive integers 
n as follows: A (n) := log p, if n = p k for some prime p and positive integer k, and 
A (n) := 0, otherwise. Show that 2 C /| „A(d) = log «, and from this, deduce that 
A(«) = - Yjd\n d ( d) log d. 
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Exercise 2.52. Show that if / is multiplicative, and if n = p e / ■ ■ ■ p e / is the prime 
factorization of n, then X f /| H d(d) 2 f(d) = (1 + /(pi)) ■ • • (1 + 

Exercise 2.53. Show that n is square-free if and only if p(d) 2 cp(d) = n. 

Exercise 2.54. Show that for every arithmetic function / with /( 1) ^ 0, there 
is a unique arithmetic function g, called the Dirichlet inverse of /, such that 
/ * g = I. Also, show that if /( 1) = 0, then / has no Dirichlet inverse. 

Exercise 2.55. Show that if / is a multiplicative function, then so is its Dirichlet 
inverse (as defined in the previous exercise). 

Exercise 2.56. This exercise develops an alternative proof of Theorem 2.40 that 
does not depend on Theorem 2.1 1. Let n be a positive integer. Define 

F n := {i/n € Q : i = 0 ,n — 1 }. 

Also, for each positive integer d, define 

G c i := { a/d e Q : a e Z, gcd(a, d) = 1}. 

(a) Show that for each x e F n , there exists a unique positive divisor d of n such 
that x e Gd. 

(b) Show that for each positive divisor d of n, we have 

F n n = {a/d : a = 0 d— 1, ged (a,d) = 1}. 

(c) Using (a) and (b), show that Yud\n V(d) = «. 

Exercise 2.57. Using Mobius inversion, directly derive Theorem 2.11 from The- 
orem 2.40. 
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In this chapter, we review standard asymptotic notation, introduce the formal com- 
putational model that we shall use throughout the rest of the text, and discuss basic 
algorithms for computing with large integers. 


3.1 Asymptotic notation 

We review some standard notation for relating the rate of growth of functions. 
This notation will be useful in discussing the running times of algorithms, and in a 
number of other contexts as well. 

Let / and g be real-valued functions. We shall assume that each is defined on 
the set of non-negative integers, or, alternatively, that each is defined on the set 
of non-negative reals. Actually, as we arc only concerned about the behavior of 
fix ) and g(x) as x -» oo, we only require that f{x) and g(x) arc defined for all 
sufficiently large x (the phrase “for all sufficiently large x” means “for some xo 
and all x > xo”)- We further assume that g is eventually positive, meaning that 
g(x) > 0 for all sufficiently large x. Then 

• / = O(g) means that |/(x)| < cg(x) for some positive constant c and all 
sufficiently large x (read, “/ is big-0 of g”), 

• / = Q(g) means that / (x) > cg(x) for some positive constant c and all 
sufficiently large x (read, “/ is big-Omega of g”), 

• / = 0(g) means that cg(x) < /(x) < dg{x) for some positive constants c 
and d and all sufficiently large x (read, “/ is big-Theta of g”), 

• / = o(g) means that /(x)/g(x) 0 as x — »• oo (read, “/ is little-o of g”), 

and 

• / ~ g means that /(x)/g(x) — >• 1 as x -» oo (read, “/ is asymptotically 
equal to g”). 


50 



3.1 Asymptotic notation 


51 


Example 3.1. Let /(x) := x 1 and g(x) := 2x 2 - lOx + 1. Then / = 0(g ) and 
/ = 0(g). Indeed, / = 0(g). □ 

Example 3.2. Let /(x) := x 2 and g(x) := x 2 - lOx + 1. Then / ~ g. □ 

Example 3.3. Let /(x) := 100x 2 and g(x) := x 3 . Then / = o(g). □ 

Note that by definition, if we write / = 0(g), / = 0(g), or / ~ g, it must be the 
case that / (in addition to g) is eventually positive; however, if we write / = O(g) 
or / = o(g), then / need not be eventually positive. 

When one writes “/ = O(g) one should interpret “• = O(-)” as a binary rela- 
tion between / with g. Analogously for “/ = 0(g),” “/ = 0(g),” and “/ = o(g).” 

One may also write “0(g)” in an expression to denote an anonymous function 
/ such that / = O(g). Analogously, 0(g), 0(g), and o(g) may denote anonymous 
functions. The expression 0(1) denotes a function bounded in absolute value by 
a constant, while the expression o(l) denotes a function that tends to zero in the 
limit. 

Example 3.4. Let /(x) := x 3 - 2x 2 + x - 3. One could write /(x) = x 3 + 0(x 2 ). 
Here, the anonymous function is g(x) := -2x 2 + x - 3, and clearly g(x) = 0(x 2 ). 
One could also write /(x) = x 3 - (2 + o(l))x 2 . Here, the anonymous function 
is g(x) := -1/x + 3/x 2 . While g = o(l), it is only defined for x > 0. This 
is acceptable, since we will only regard statements such as this asymptotically, as 
x -» 00 . □ 

As an even further use (abuse?) of the notation, one may use the big-O, big- 
Omega, and big-Theta notation for functions on an arbitrary domain, in which case 
the relevant inequalities should hold throughout the entire domain. This usage 
includes functions of several independent variables, as well as functions defined 
on sets with no natural ordering. 

Exercise 3.1 . Show that: 

(a) / = o(g) implies / = O(g) and g ^ 0(/); 

(b) / = 0(g) and g = 0(h ) implies / = O(A); 

(c) / = 0(g) and g = o{h) implies / = o(/r); 

(d) / = o(g) and g = 0(/i) implies / = o(h). 

Exercise 3.2 . Let / and g be eventually positive functions. Show that: 

(a) / ~ g if and only if / = (1 + o(l))g; 

(b) / ~ g implies / = 0(g); 

(c) / = 0(g) if and only if / = O(g) and / = £2(g); 
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(d) / = Q(g) if and only if g = O(f). 

Exercise 3.3 . Suppose fa = O(gj) and fa = 0(g 2 ). Show that fa + fa = 
0(max(gi,g 2 )), fa fa = 0(gig 2 ), and that for every constant c, cfa = 0(g l ). 

Exercise 3.4 . Suppose that f(x) < c + dg{x) for some positive constants c and 
d, and for all sufficiently large x. Show that if g = £2(1), then / = 0(g). 

Exercise 3.5 . Suppose / and g are defined on the integers i > k, and that 
g(i) > 0 for all i > k. Show that if / = 0(g), then there exists a positive constant 
c such that |/(/)| < cg(i) for all i > k. 

Exercise 3.6 . Let / and g be eventually positive functions, and assume that 
/(x)/g(x) tends to a limit L (possibly L = oo) as x — »• oo. Show that: 

(a) if L = 0, then / = o(g); 

(b) if 0 < L < oo, then / = 0(g); 

(c) if L = oo, then g = o(f). 

Exercise 3.7 . Let /(x) := x“(logx)^ and g(x) := x r (logx) 5 , where a, fay, 8 
arc non-negative constants. Show that if a < y, or if a = y and /l < <5, then 
/ = o(g). 

Exercise 3.8. Order the following functions in x so that for each adjacent pair 
/, g in the ordering, we have / = 0(g), and indicate if / = o(g), / ~ g, or 
8 = 0(f): 

x 3 , e*x 2 , 1/x, x 2 (x + 100) + 1/x, x + Vx, log 2 x, log 3 x, 2x 2 , x, 
e~ x , 2x 2 - 10x + 4, e x+Vx , 2 X , 3 X , x~ 2 , x 2 (logx) 1000 . 

Exercise 3.9 . Show that: 

(a) the relation is an equivalence relation on the set of eventually positive 
functions; 

(b) for all eventually positive functions fa, fa, gi, g 2 , if fa ~ gi and fa ~ g 2 , 
then f\ * fa ~ gi * g 2 , where denotes addition, multiplication, or 
division; 

(c) for all eventually positive functions /, g, and every a > 0, if / ~ g, then 
J ~ g , 

(d) for all eventually positive functions /, g, and every function h such that 
h(x) — »• oo as x -» oo, if / ~ g, then / o h ~ g o h, where “o” denotes 
function composition. 

Exercise 3.10 . Show that all of the claims in the previous exercise also hold 
when the relation is replaced with the relation “• = 0(-)” 
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Exercise 3.11 . Let /, g be eventually positive functions. Show that: 

(a) / = 0(g) if and only if log / = logg + 0(1); 

(b) / rsj g if and only if log / = log g + o(l). 

Exercise 3.12 . Suppose that / and g are functions defined on the integers 
k,k+ 1, . . . , and that g is eventually positive. For n > k, define F(n) := / (/) 

and G(«) := &(0- Show that if / = O(g) and G is eventually positive, then 

F = O(G). 

Exercise 3.13 . Suppose that / and g are piece-wise continuous on [ a , oo) (see 
§A4), and that g is eventually positive. For x > a, define F(x) := J* /(f) dt and 
G{x) := J* g(f) dt. Show that if / = O(g) and G is eventually positive, then 
F = 0(G)“ 

Exercise 3.14. Suppose that / and g are functions defined on the integers 
k,k + 1, . . . , and that both / and g are eventually positive. For n > k, define 
F(n) := X'U /(O an d := g(0- Show that if / ~ g and G{n) -*■ oo as 

n -> oo, then F G. 

Exercise 3.15. Suppose that / and g are piece-wise continuous on [n, oo) (see 
§A4), and that both / and g arc eventually positive. For x > a, define F(x ) := 
J* /(f) df and G(x) := g(f) dt. Show that if / ~ g and G{x) -» oo as x -> oo, 
then F ~ G. 

Exercise 3.16. Give an example of two non-decreasing functions / and g, each 
mapping positive integers to positive integers, such that / ^ O(g) and g / 0(/). 


3.2 Machine models and complexity theory 

When presenting an algorithm, we shall always use a high-level, and somewhat 
informal, notation. However, all of our high-level descriptions can be routinely 
translated into the machine-language of an actual computer. So that our theorems 
on the running times of algorithms have a precise mathematical meaning, we for- 
mally define an “idealized” computer: the random access machine or RAM. 

A RAM consists of an unbounded sequence of memory cells 

m[0], m[ 1], m[ 2], . . . , 

each of which can store an arbitrary integer, together with a program. A program 
consists of a finite sequence of instructions where each instruction is of 

one of the following types: 
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arithmetic This type of instruction is of the form y ■*— a* ft, where * represents one 
of the operations addition, subtraction, multiplication, or integer division 
(i.e., [_•/•_!)• The values a and p arc of the form c, m[a ], or m\m\a \ ], and 
y is of the form m \ a ] or m\m\a\\, where c is an integer constant and a is a 
non-negative integer constant. Execution of this type of instruction causes 
the value a + p to be evaluated and then stored in y. 
branching This type of instruction is of the form IF a O p GOTO i, where i is 
the index of an instruction, and where O is one of the comparison opera- 
tions =, <, >, <, >, and a and /! are as above. Execution of this type of 

instruction causes the “flow of control” to pass conditionally to instruction 

halt The HALT instruction halts the execution of the program. 

A RAM works by executing instruction /(>, and continues to execute instruc- 
tions, following branching instructions as appropriate, until a HALT instruction is 
reached. 

We do not specify input or output instructions, and instead assume that the input 
and output are to be found in memory cells at some prescribed locations, in some 
standardized format. 

To determine the running time of a program on a given input, we charge 1 unit 
of time to each instruction executed. 

This model of computation closely resembles a typical modern-day computer, 
except that we have abstracted away many annoying details. However, there arc 
two details of real machines that cannot be ignored; namely, any real machine has 
a finite number of memory cells, and each cell can store numbers only in some 
fixed range. 

The first limitation must be dealt with by either purchasing sufficient memory or 
designing more space-efficient algorithms. 

The second limitation is especially annoying, as we will want to perform compu- 
tations with quite large integers — much larger than will fit into any single memory 
cell of an actual machine. To deal with this limitation, we shall represent such large 
integers as vectors of digits in some fixed base, so that each digit is bounded in 
order to fit into a memory cell. This is discussed in more detail in the next section. 
The only other numbers we actually need to store in memory cells arc “small” 
numbers representing array indices, counters, and the like, which we hope will fit 
into the memory cells of actual machines. Below, we shall make a more precise, 
formal restriction on the magnitude of numbers that may be stored in memory cells. 

Even with these caveats and restrictions, the running time as we have defined 
it for a RAM is still only a rough predictor of performance on an actual machine. 
On a real machine, different instructions may take significantly different amounts 
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of time to execute; for example, a division instruction may take much longer than 
an addition instruction. Also, on a real machine, the behavior of the cache may 
significantly affect the time it takes to load or store the operands of an instruction. 
Finally, the precise running time of an algorithm given by a high-level description 
will depend on the quality of the translation of this algorithm into “machine code.” 
However, despite all of these problems, it still turns out that measuring the running 
time on a RAM as we propose here is a good “first order” predictor of performance 
on real machines in many cases. Also, we shall only state the running time of an 
algorithm using a big-0 estimate, so that implementation-specific constant factors 
arc anyway “swept under the rug.” 

If we have an algorithm for solving a certain problem, we expect that “larger” 
instances of the problem will require more time to solve than “smaller” instances, 
and a general goal in the analysis of any algorithm is to estimate the rate of growth 
of the running time of the algorithm as a function of the size of its input. For this 
puipose, we shall simply measure the size of an input as the number of memory 
cells used to represent it. Theoretical computer scientists sometimes equate the 
notion of “efficient” with “polynomial time” (although not everyone takes theo- 
retical computer scientists very seriously, especially on this point): a polynomial- 
time algorithm is one whose running time on inputs of size n is at most an b + c, 
for some constants a, b, and c (a “real” theoretical computer scientist will write 
this as n° <]} ). Furthermore, we also require that for a polynomial-time algorithm, 
all numbers stored in memory are at most a'n b + d in absolute value, for some 
constants a', //, and d . Even for algorithms that are not polynomial time, we shall 
insist that after executing t instructions, all numbers stored in memory are at most 
a'{n + t) h + c' in absolute value, for some constants a', b\ and c' . 

Note that in defining the notion of polynomial time on a RAM, it is essential 
that we restrict the magnitude of numbers that may be stored in the machine’s 
memory cells, as we have done above. Without this restriction, a program could 
perform arithmetic on huge numbers, being charged just one unit of time for each 
arithmetic operation — not only is this intuitively “wrong,” it is possible to come up 
with programs that solve some problems using a polynomial number of arithmetic 
operations on huge numbers, and these problems cannot otherwise be solved in 
polynomial time (see §3.6). 


3.3 Basic integer arithmetic 

We will need algorithms for performing arithmetic on very large integers. Since 
such integers will exceed the word-size of actual machines, and to satisfy the for- 
mal requirements of our random access model of computation, we shall represent 
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large integers as vectors of digits in some base B. along with a bit indicating the 
sign. That is, for a e Z, if we write 

k - 1 

a = ± ^ fl/2?' = ±{cik - 1 • • • a\ao)B, 

/=o 

where 0 < a, < B for i = 0 k — 1 , then a will be represented in memory as 

a data structure consisting of the vector of base-2? digits a^, . . . , a^- 1 , along with 
a “sign bit” to indicate the sign of a. To ensure a unique representation, if a is 
non-zero, then the high-order digit a^- 1 in this representation should be non-zero. 

For our puiposes, we shall consider B to be a constant, and moreover, a power of 
2. The choice of B as a power of 2 is convenient for a number of technical reasons. 

A note to the reader: If you are not interested in the low-level details of algo- 
rithms for integer arithmetic, or are willing to take them on faith, you may safely 
skip ahead to §3.3.5, where the results of this section are summarized. 

We now discuss in detail basic arithmetic algorithms for unsigned (i.e., non- 
negative) integers — these algorithms work with vectors of base- B digits, and 
except where explicitly noted, we do not assume that the high-order digits of the 
input vectors are non-zero, nor do these algorithms ensure that the high-order digit 
of the output vector is non-zero. These algorithms can be very easily adapted to 
deal with arbitrary signed integers, and to take proper care that the high-order digit 
of the vector representing a non-zero number is itself non-zero (the reader is asked 
to till in these details in some of the exercises below). All of these algorithms 
can be implemented directly in a programming language that provides a “built-in” 
signed integer type that can represent all integers of absolute value less than B 2 , and 
that supports the basic arithmetic operations (addition, subtraction, multiplication, 
integer division). So, for example, using the C or Java programming language’s 
int type on a typical 32-bit computer, we could take B = 2 15 . The resulting 
software would be reasonably efficient and portable, but certainly not the fastest 
possible. 

Suppose we have the base-2? representations of two unsigned integers a and b. 
We present algorithms to compute the base-2? representation of a + b, a — b, a ■ b, 
\_a/b\ , and a mod b. To simplify the presentation, for integers x, y with y f 0. we 
denote by QuoRem(x, y) the quotient/remainder pair ( [x/yj , x mod y). 


3.3.1 Addition 

Let a = (fl/t-i • • • flo )b and b = {b(-\ ■ ■ ■ bf)B be unsigned integers. Assume that 
k > l > 1 (if k < l, then we can just swap a and b). The sum c := a + b is of the 
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form c = (cfcCfc_i • • • co) b- Using the standard “paper-and-pencil” method (adapted 
from base- 10 to base-5, of course), we can compute the base-5 representation of 
a + b in time 0(k), as follows: 

carry ■<— 0 

for / *- 0 to l - 1 do 

tmp <r- a, + bj + carry, ( carry , c,) 4- QuoRem(fmp, 5) 
for / <- t to k — I do 

tmp <— a, + cany, (cany, c,) Quo Rem (tmp, 5) 

Ck cany 

Note that in every loop iteration, the value of cany is 0 or 1, and the value tmp 
lies between 0 and 25-1. 


3.3.2 Subtraction 

Let a = (fl/t-i • • • «o )b and b = (b(-\ ■ ■ ■ ^o)b be unsigned integers. Assume that 
k > [ > 1. To compute the difference c := a — b, we may use the same algorithm 
as above, but with the expression “a, + b” replaced by “ a , - bj.” In every loop 
iteration, the value of cany is 0 or -1, and the value of tmp lies between -5 and 
5 — 1. If a > b, then c/< = 0 (i.e., there is no carry out of the last loop iteration); 
otherwise, eg = —l (and b — a = B k — (eg - 1 • • • cq)b, which can be computed with 
another execution of the subtraction routine). 


3.3.3 Multiplication 

Let a = (ag - j • • • «o )b and b = (bj-\ ■ ■ ■ bo) b be unsigned integers, with k > 1 and 
( > 1. The product c := a ■ b is of the form (cg+t- \ ■ ■ ■ cq)b, and may be computed 
in time 0(kt) as follows: 

for i <- 0 to k + l — 1 do c, <— 0 
for i <r- 0 to k — 1 do 
cany <— 0 

for j <- 0 to i - 1 do 

tmp 4- ajbj + c i+ j + cany 
(cany, c i+J ) <r- Quo Rem (//;?/;, 5) 

Ci+t 4- cany 

Note that at every step in the above algorithm, the value of cany lies between 0 
and 5 - 1 , and the value of tmp lies between 0 and 5 2 — 1. 



58 


Computing with large integers 

3.3.4 Division with remainder 


Let a = (fl/c-i • • • «o )b and b = (b(-\ ■ ■ ■ bo) B be unsigned integers, with k > 1, 
£ > 1, and bf-i ^ 0. We want to compute q and r such that a = bq + r and 
0 < r < b. Assume that k > C. otherwise, a < b, and we can just set q «- 0 and 
r *— a. The quotient q will have at most m := k — i + 1 base-2? digits. Write 
q = (<?„,_ i • • • q Q ) B - 

At a high level, the strategy we shall use to compute q and r is the following: 
r <r- a 

for i m — 1 down to 0 do 
q i <- [r/B‘b\ 
r r - B l ■ q t b 

One easily verifies by induction that at the beginning of each loop iteration, we 
have 0 < r < B' +l b, and hence each q t will be between 0 and B - 1, as required. 

Turning the above strategy into a detailed algorithm takes a bit of work. In 
particular, we want an easy way to compute | r / B‘b\. Now, we could in theory 
just tty all possible choices for q t — this would take time O(Bd), and viewing B 
as a constant, this is O(t). However, this is not really very desirable from either a 
practical or theoretical point of view, and we can do much better with just a little 
effort. 

We shall first consider a special case; namely, the case where £ = 1. In this case, 
the computation of the quotient [r / B‘ b \ is facilitated by the following theorem, 
which essentially tells us that this quotient is determined by the two high-order 
digits of r: 

Theorem 3.1. Let x and y be integers such that 

Q<x = x'2 n + s and 0 < y = y'T 

for some integers n,s,x',y ' , with n> 0 and 0 < s < 2". Then [_x/y\ = \_x' /y' \. 
Proof. We have 

x x' s x' 

— — — H - ^ — . 

y y’ y'2 n ~ y' 

It follows immediately that [_x/y\ > [_x’/y’\. 

We also have 

xx 1 s x' 1 / x' 

y ~ y + y2" < y + y ~ V 7 + 

Thus, we have x/y < [x'/y'J + 1, and hence, [_x/y\ < \_x' /y'\. □ 
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From this theorem, one sees that the following algorithm correctly computes the 
quotient and remainder in time 0(k ) (in the case £ = 1): 

hi <- 0 

for i *— k — 1 down to 0 do 
imp <— hi ■ B + cij 
( qj,hi ) <- QuoRcmOm/;. /) 0 j 

output the quotient q = (r// ; _| • • • q 0 ) B and the remainder hi 

Note that in every loop iteration, the value of hi lies between 0 and bo < B — 1, 
and the value of tmp lies between 0 and B ■ bo + (B - 1) < B 2 — 1. 

That takes care of the special case where 1 = 1. Now we turn to the general case 
£ > 1. In this case, we cannot so easily get the digits q t of the quotient, but we can 
still fairly easily estimate these digits, using the following: 

Theorem 3.2. Let x and y be integers such that 

0 < x = x'2" + s and 0 < y = y' 2" + t 

for some integers n,s,t,x',y' with n > 0. 0 < s < 2", and 0 < t < 2". Further, 
suppose that 2 y' > x/y. Then 

[x/y\ < Lx'/yj < Lx/yJ +2. 

Proof. We have x/y < x/y'2 n , and so [_x/y\ < Lx/y'2"J, and by the previous 
theorem, [x / y'2"J = [x'/y'J . That proves the first inequality. 

For the second inequality, first note that from the definitions, we have x/y > 
x'/fy'+ l), which implies x'y—xy'—x < 0. Further, 2y' > x/y implies 2yy'-x > 0. 
So we have 2yy' - x > 0 > x'y — xy' — x, which implies x/y > x'/y' - 2, and 
hence [x/y\ > [x'/y'\ — 2. □ 

Based on this theorem, we first present an algorithm for division with remain- 
der that works if we assume that b is appropriately “normalized,” meaning that 
be - 1 > 2 W ~ I , where B = 2 W . This algorithm is shown in Fig. 3.1. 

Some remarks are in order. 

1. In line 4, we compute q t , which by Theorem 3.2 is greater than or equal to 
the true quotient digit, but exceeds this value by at most 2. 

2. In line 5, we reduce q t if it is obviously too big. 

3. In lines 6-10, we compute 

(n+t ■ ■ ■ n)B <- (r i+l ■ ■ ■ rf)B - q t b. 

In each loop iteration, the value of tmp lies between — ( B 1 — B) and B — 1, 
and the value carry lies between —( B — 1) and 0. 
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1. for i <r- 0 to k — 1 do r; «- a , 

2 . r k <- 0 

3. for i *— k — l down to 0 do 

4 . q t <^ Y(r i+t B + r i+t -\)/bt-i\ 

5. if q t > B then q t <- B — 1 

6. carry <- 0 

7. for j 4- 0 to £ - 1 do 

8. Imp <— r i+ j - qjbj + carry 

9. (carry, r /+ /) <- QuoRcm(/m/;, B) 


10. 

r i+l r i+e + cany 


11. 

while r i+ { < 0 do 


12. 

cany 0 


13. 

for j *- 0 to t - 1 do 


14. 

tmp 4- r i+ j + b , 

+ cany 

15. 

(carry, r i+j ) <- 

QuoRem (tmp, B) 

16. 

ri+e n+i + cany 


17. 

t 

1 


18. 

output the quotient q = (q k -e ■ • • 

qd)B 


and the remainder r = (re - i • • • ro ) B 


Fig. 3.1. Division with Remainder Algorithm 


4. If the estimate q t is too large, this is manifested by a negative value of r l+i 
at line 10. Lines 11-17 detect and correct this condition: the loop body 
here executes at most twice; in lines 12-16, we compute 

(n+e ■ ■ ■ n) B <- (n+e ■ ■ ■ c)b + (be - 1 • • • b Q ) B . 

Just as in the algorithm in §3.3.1, in every iteration of the loop in lines 
13-15, the value of cany is 0 or 1, and the value tmp lies between 0 and 
2B- 1. 

It is easily verified that the running time of the above algorithm is 0(i-(k—l+ 1)). 

Finally, consider the general case, where b may not be normalized. We multiply 
both a and b by an appropriate value 2 W ' , with 0 < w' < w, obtaining a' := a2 w ' 
and b' := b2 w ' , where b' is normalized; alternatively, we can use a more efficient, 
special-purpose “left shift” algorithm to achieve the same effect. We then compute 
q and r' such that d = b'q + r' , using the division algorithm in Fig. 3.1. Observe 
that q = [_d /b' \ = [a/b\, and r' = r2 w , where r = a mod b. To recover r, we 
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simply divide r' by 2 W , which we can do either using the above “single precision” 
division algorithm, or by using a special-purpose “right shift” algorithm. All of 
this normalizing and denormalizing takes time Q(k + t). Thus, the total running 
time for division with remainder is still 0(1 ■ (k — f + I )). 


Exercise 3.17. Work out the details of algorithms for arithmetic on signed inte- 
gers, using the above algorithms for unsigned integers as subroutines. You should 
give algorithms for addition, subtraction, multiplication, and division with remain- 
der of arbitrary signed integers (for division with remainder, your algorithm should 
compute [a/h\ and a mod b). Make sure your algorithms correctly compute the 
sign bit of the results, and also strip any leading zero digits from the results. 

Exercise 3.18. Work out the details of an algorithm that compares two signed 
integers a and b, determining which of a < b, a = b, or a > b holds. 

Exercise 3.19. Suppose that we run the division with remainder algorithm in 
Fig. 3. 1 for l > 1 without normalizing b, but instead, we compute the value q, in 
line 4 as follows: 

Qi <- L (n+f.B 2 + r i+ (- 1 B + r i+ i- 2 )/(bt-iB + b t -i) J- 

Show that < 7 , is either equal to the correct quotient digit, or the correct quotient digit 
plus 1 . Note that a limitation of this approach is that the numbers involved in the 
computation are larger than B 2 . 

Exercise 3.20. Work out the details for an algorithm that shifts a given unsigned 
integer a to the left by a specified number of bits s (i.e., computes b := a ■ 2 s ). 
The running time of your algorithm should be linear in the number of digits of the 
output. 

Exercise 3.21. Work out the details for an algorithm that shifts a given unsigned 
integer a to the right by a specified number of bits s (i.e., computes b := \a/ 2 S J). 
The running time of your algorithm should be linear in the number of digits of the 
output. Now modify your algorithm so that it correctly computes \_a/2 s \ for signed 
integers a. 

Exercise 3.22. This exercise is for Cl Java programmers. Evaluate the Cl Java 
expressions 

(-17) l 4; (-17) & 3; 

and compare these values with (-17) mod 4. Also evaluate the Cl Java expressions 


(-17) / 4; (-17) » 2; 



62 


Computing with large integers 


and compare with [-17 / 4J . Explain your findings. 

Exercise 3.23. This exercise is also for ClJava programmers. Suppose that 
values of type int arc stored using a 32-bit 2’s complement representation, and 
that all basic arithmetic operations arc computed correctly modulo 2 32 , even if an 
“overflow” happens to occur. Also assume that double precision floating point 
has 53 bits of precision, and that all basic arithmetic operations give a result with 
a relative error of at most 2~ 53 . Also assume that conversion from type int to 
double is exact, and that conversion from double to int truncates the fractional 
part. Now, suppose we arc given int variables a, b, and n, such that 1 < n < 2 30 , 
0 < a < n, and 0 < b < n. Show that after the following code sequence is 
executed, the value of r is equal to (a • b) mod n: 
int q; 

q = (int) ((((double) a) * ((double) b)) / ((double) n)); 
r = a*b - q*n; 
if (r >= n) 
r = r - n; 
else if (r < 0) 
r = r + n; 


3.3.5 Summary 

We now su mm arize the results of this section. For an integer a , we define its bit 
length, or simply, its length, which we denote by len(a), to be the number of bits 
in the binary representation of |a|; more precisely, 

Llog 2 |n|J + 1 if a ^0, 

1 if a = 0. 

If len(a) = l, we say that a is an Abit integer. Notice that if a is a positive, I-bit 
integer, then log 9 a < I < log 9 a + 1, or equivalently, 2 f ~ l < a <2 ( . 

Assuming that arbitrarily large integers arc represented as described at the begin- 
ning of this section, with a sign bit and a vector of base- B digits, where B is a 
constant power of 2, we may state the following theorem. 

Theorem 3.3. Let a and b be arbitrary integers. 

(i) We can compute a± b in time 0(len(a) + lent/;)). 

(ii) We can compute a ■ b in time ()(\cn(a) lent A)). 

(Hi) If b ^ 0, we can compute the quotient q := [a / b\ and the remainder 
r := a mod b in time 0(len(b) len(< 7 )). 


len(n) 


-{ 
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Note the bound 0(len(6) len(g)) in part (iii) of this theorem, which may be 
significantly less than the bound 0(len(a) lent/))). A good way to remember this 
bound is as follows: the time to compute the quotient and remainder is roughly the 
same as the time to compute the product hq appealing in the equality a = bq + r. 

This theorem does not explicitly refer to the base B in the underlying implemen- 
tation. The choice of B affects the values of the implied big-0 constants; while in 
theory, this is of no significance, it does have a significant impact in practice. 

From now on, we shall (for the most paid) not worry about the implementa- 
tion details of long-integer arithmetic, and will just refer directly to this theorem. 
However, we will occasionally exploit some trivial aspects of our data structure for 
representing large integers. For example, it is clear that in constant time, we can 
determine the sign of a given integer a, the bit length of a, and any particular bit of 
the binary representation of a: moreover, as discussed in Exercises 3.20 and 3.21, 
multiplications and divisions by powers of 2 can be computed in linear time via 
“left shifts” and “right shifts.” It is also clear that we can convert between the base- 
2 representation of a given integer and our implementation’s internal representation 
in lineal - time (other conversions may take longer — see Exercise 3.32). 

We wish to stress the point that efficient algorithms on large integers should 
run in time bounded by a polynomial in the bit lengths of the inputs, rather than 
their magnitudes. For example, if the input to an algorithm is an (-bit integer n, 
and if the algorithm runs in time Oil 2 ), it will easily be able to process 1000-bit 
inputs in a reasonable amount of time (a fraction of a second) on a typical, modern 
computer. However, if the algorithm runs in time, say, 0(«’/ 2 ), this means that 
on 1000-bit inputs, it will take roughly 2 500 computing steps, which even on the 
fastest computer available today or in the foreseeable future, will still be running 
long after our solar system no longer exists. 

A note on notation: “len” and “log.” In expressing the running times 
of algorithms in terms of an input a , we generally prefer to write len (a) 
rather than log a. One reason is esthetic: writing len(n) stresses the fact 
that the running time is a function of the bit length of a. Another reason is 
technical: for big-0 estimates involving functions on an arbitrary domain, 
the appropriate inequalities should hold throughout the domain, and for 
this reason, it is very inconvenient to use functions, like log, which vanish 
or are undefined on some inputs. 


Exercise 3.24 . Let a, b e Z with a > b > 0, and let q := \_a/b\. Show that 
len(a) - len(ft) — 1 < len(^) < len(n) - len(Z>) + 1. 



64 


Computing with large integers 


Exercise 3.25 . Let n \, . . . , be positive integers. Show that 

k k k 

'y len(n,) — k < n i \ < ^ len(n,). 

i=i i=i (=1 

Exercise 3.26 . Show that given integers n\ «/ c , with each n, > 1, we can 

compute the product n := n, in time 0(len(n) 2 ). 

Exercise 3.27 . Show that given integers a,n\,...,rik, with each n, > 1, where 
0 < n < n := 11, n h we can compute (n mod n\,...,a mod n k) in time 0(len(n) 2 ). 

Exercise 3.28 . Show that given integers n\ n^, with each «, > 1, we can 

compute (n/n j, . . . ,n/rik), where n := ]~[ ( . n h in time 0(len(n) 2 ). 

Exercise 3.29 . This exercise develops an algorithm to compute LV»J for a given 
positive integer n. Consider the following algorithm: 

k <r- [(len(n) - 1)/2J, m <- 2 k 
for i *— k — 1 down to 0 do 

if ( m + 2 1 ) 2 < n then m <- m + 2‘ 
output m 

(a) Show that this algorithm correctly computes [\/«| . 

(b) In a straightforward implementation of this algorithm, each loop itera- 
tion takes time 0(len(n) 2 ), yielding a total running time of 0(len(n) 3 ). 
Give a more careful implementation, so that each loop iteration takes time 
0(len(«)), yielding a total running time is 0(len(«) 2 ). 

Exercise 3.30 . Modify the algorithm in the previous exercise so that given pos- 
itive integers n and e, with n > 2 e , it computes \n l / e J in time 0(len(n) 3 /e). 

Exercise 3.31 . An integer n > 1 is called a perfect power if n = a b for some 
integers a > 1 and b > 1. Using the algorithm from the previous exercise, design 
an efficient algorithm that determines if a given n is a perfect power, and if it is, 
also computes a and b such that n = a b , where a > 1, b > 1, and a is as small as 
possible. Your algorithm should run in time 0(1 3 len(f)), where l := len(«). 

Exercise 3.32 . Show how to convert (in both directions) in time 0(len(n) 2 ) 
between the base- 10 representation and our implementation’s internal representa- 
tion of an integer n. 


3.4 Computing in 

Let n be a positive integer. For every a e Z„, there exists a unique integer 
a e {0, — 1} such that a = [n]„; we call this integer a the canonical 
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representative of a, and denote it by rep(a). For computational purposes, we 
represent elements of Z„ by their canonical representatives. 

Addition and subtraction in Z„ can be performed in time Oflenfn)): given 
a, (3 e Z„, to compute repfa + ft), we first compute the integer sum repfa) + rep(/?), 
and then subtract n if the result is greater than or equal to n: similarly, to com- 
pute repfa - j3), we compute the integer difference repfa) - repf/i), adding n if 
the result is negative. Multiplication in Z„ can be performed in time Oflenfn) 2 ): 
given a, fl e Z„, we compute repfa • /?) as repfa) repf/i) mod n, using one integer 
multiplication and one division with remainder. 

A note on notation: “rep,” “mod,” and In describing algorithms, 

as well as in other contexts, if a, (3 are elements of Z„, we may write, for 
example, y *- a + /? or y <— afi. and it is understood that elements of 
Z„ are represented by their canonical representatives as discussed above, 
and arithmetic on canonical representatives is done modulo n. Thus, we 
have in mind a “strongly typed" language for our pseudo-code that makes 
a clear distinction between integers in the set {0, . . . , n — 1 } and elements 
of Z„. If a e Z, we can convert a to an object a e Z„ by writing a <- [«]„, 
and if a e {0, 1 } , this type conversion is purely conceptual, involv- 
ing no actual computation. Conversely, if a e Z„, we can convert a to 
an object a e {0, ...,« — 1 }, by writing a <- rep(a); again, this type 
conversion is purely conceptual, and involves no actual computation. It 
is perhaps also worthwhile to stress the distinction between a mod n and 
[a ] n — the former denotes an element of the set {0, .... n — 1 }, while the 
latter denotes an element of Z„. 

Another interesting problem is exponentiation in Z„ : given a € Z„ and a non- 
negative integer e, compute a e e Z„. Perhaps the most obvious way to do this is to 
iteratively multiply by a a total of e times, requiring time 0(e len(n) 2 ). For small 
values of e, this is line; however, a much faster algorithm, the repeated-squaring 
algorithm, computes a e using just O(lenfe)) multiplications in Z„, thus taking 
time O(lenfe) len(n) 2 ). 

This method is based on the following observation. Let e = (A-i • • • bo ) 2 be 

the binary expansion of e (where bo is the low-order bit). For i = 0 ,1, define 

e,- := Le/2'J ; the binary expansion of e, is e, = (b(-\ ■ ■ ■ b t ) 2 . Also define /?, := a e ‘ 
for 1 = 0 i , so fa = 1 and /?o = a e . Then we have 

e, = 2e,+ i + bi and ft, = /i 2 +| • a bi for i = 0 ,i - 1. 

This observation yields the following algorithm for computing a e : 

The repeated-squaring algorithm. On input a, e, where a e Z„ and e is a non- 
negative integer, do the following, where e = (/>r_i • • • bo ) 2 is the binary expansion 
of e\ 
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P <- [ 1 ]„ 

for / <- £ — 1 down to 0 do 

P^ P 2 

if b t = 1 then p <- p • a 
output p 

It is clear that when this algorithm terminates, we have /l = a e , and that the 
running-time estimate is as claimed above. Indeed, the algorithm uses £ squarings 
in Z„, and at most £ additional multiplications in Z„. 


Example 3.5. Suppose e = 37 = (100101)2- The above algorithm performs the 
following operations in this case: 



[ 1 ] 

// computed exponent ( in binary ) 
//() 

P 2 ,P^ P- a 

//I 

P 2 

// 10 

P 2 

// 100 

P 2 ,P <- P- a 

// 1001 

P 2 

// 10010 

P 2 ,P^ p- a 

// 100101 . □ 


The repeated-squaring algorithm has numerous applications. We mention a few 
here, but we will see many more later on. 

Computing multiplicative inverses in 7L P . Suppose we are given a prime p and an 
element a e Z*, and we want to compute a -1 . By Euler’s theorem (Theorem 2.13), 
we have a p ~ l = 1 , and multiplying this equation by a -1 , we obtain a p ~ 2 = a~ l . 
Thus, we can use the repeated-squaring algorithm to compute a~ l by raising a to 
the power p — 2. This algorithm runs in time 0(len(p) 3 ). While this is reasonably 
efficient, we will develop an even more efficient method in the next chapter, using 
Euclid’s algorithm (which also works with any modulus, not just a prime modulus). 

Testing quadratic residuosity. Suppose we arc given an odd prime p and an 
element a e Z*, and we want to test whether a e (Z*) 2 . By Euler’s criterion 
(Theorem 2.21), we have a e (Z*) 2 if and only if a (p ~ 1 ^ 2 = 1. Thus, we can 
use the repeated-squaring algorithm to test if a e (Z*) 2 by raising a to the power 
(p — l)/2. This algorithm runs in time 0(len(p) 3 ). While this is also reasonably 
efficient, we will develop an even more efficient method later in the text (in Chap- 
ter 12). 

Testing for primality. Suppose we are given an integer n > 1, and we want 
to determine whether n is prime or composite. For large n, searching for prime 
factors of n is hopelessly impractical. A better idea is to use Euler’s theorem. 
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combined with the repeated-squaring algorithm: we know that if n is prime, then 
every non-zero a e Z„ satisfies a n ~ l = 1. Conversely, if n is composite, there 
exists a non-zero a e Z„ such that a"~ l ^ 1 (see Exercise 2.27). This suggests the 
following “trial and error” strategy for testing if n is prime: 

repeat k times 

choose a e Z„ \ { [0] } 

compute /? «- a n ~ l 

if ^ 1 output “composite” and halt 

output “maybe prime” 

As stated, this is not a fully specified algorithm: we have to specify the loop- 
iteration parameter k, and more importantly, we have to specify a procedure for 
choosing a in each loop iteration. One approach might be to just tty a = [1], [2], 

[3], Another might be to choose a at random in each loop iteration: this would 

be an example of a probabilistic algorithm (a notion we shall discuss in detail in 
Chapter 9). In any case, if the algorithm outputs “composite,” we may conclude 
that n is composite (even though the algorithm does not find a non-trivial factor of 
n). However, if the algorithm completes all k loop iterations and outputs “maybe 
prime,” it is not clear what we should conclude: certainly, we have some reason to 
suspect that n is prime, but not really a proof; indeed, it may be the case that n is 
composite, but we were just unlucky in all of our choices for a. Thus, while this 
rough idea does not quite give us an effective primality test, it is not a bad start, and 
is the basis for several effective primality tests (a couple of which we shall discuss 
in detail in Chapters 10 and 21). 

Exercise 3.33. The repeated-squaring algorithm we have presented here 
processes the bits of the exponent from left to right (i.e., from high order to low 
order). Develop an algorithm for exponentiation in Z„ with si mi lar complexity that 
processes the bits of the exponent from right to left. 

Exercise 3.34. Show that given a prime p, a e Z p , and an integer e > p, we can 
compute a e in time 0(len(e) len(p) + len(p) 3 ). 

The following exercises develop some important efficiency improvements to the 
basic repeated-squaring algorithm. 

Exercise 3.35. The goal of this exercise is to develop a “2 r -ary” valiant of the 
above repeated-squaring algorithm, in which the exponent is effectively treated as 
a number in base 2 ? , for some parameter t, rather than in base 2. Let a e Z„ and 
let e be a positive integer of length l. Let us write e in base 2 f as e = (eg ■ ■ ■ e^)?, 
where e/ ; ^ 0. Consider the following algorithm: 
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compute a table of values T[0 . . . 2' — 1], 
where T\j\ := for j = 0, . . . , 2* — 1 
P <- T[e k ] 

for i *— k — 1 down to 0 do 

P <- P 2 ' ■ T[e,] 

(a) Show that this algorithm correctly computes a e , and work out the imple- 
mentation details; in particular - , show that it may be implemented in such a 
way that it uses at most l squarings and 2 ' + ift + 0(1) additional multi- 
plications in Z„. 

(b) Show that, by appropriately choosing the parameter t, we can bound the 
number of multiplications in Z„ (besides the squarings) by ()((/ lcn(f)). 
Thus, from an asymptotic point of view, the cost of exponentiation is essen- 
tially the cost of about l squarings in Z„. 

(c) Improve the algorithm so that it only uses no more than i squarings and 

2 r_1 + £/t + 0(1) additional multiplications in Z„. Hint: build a table that 
contains only the odd powers of a among a 0 , a 1 a 2 -1 . 

Exercise 3.36. Suppose we are given aq a k e Z„, along with non-negative 

integers e\,...,e k , where len(e,) < i for i = 1 Show how to compute 

ft := aq 1 • • • a?', using at most i squarings and l + 2 k additional multiplications 
in Z„. Your algorithm should work in two phases: the first phase uses only the 
values ai,...,a k , and performs at most 2 k multiplications in Z in the second 
phase, the algorithm computes ft, using the exponents e\,...,e k , along with the 
data computed in the first phase, and performs at most i squarings and t additional 
multiplications in Z„. 

Exercise 3.37. Suppose that we are to compute a e , where a e Z„, for many 
exponents e of length at most l, but with a fixed. Show that for every positive 
integer parameter k, we can make a pre-computation (depending on or, f, and k) 
that uses at most £ squarings and 2 k additional multiplications in Z„, so that after 
the pre-computation, we can compute a e for every exponent e of length at most ( 
using at most £/k + 0(1) squarings and £/k + 0(1) additional multiplications in 
Z„. Hint: use the algorithm in the previous exercise. 

Exercise 3.38. Suppose we are given a e Z„, along with non-negative integers 
e\,...,e r , where len(e,) < l for i = 1 and r = 0(len(f!)). Using the 

previous exercise, show how to compute (a ei , . . . , a er ) using 0(1) multiplications 
in Z„. 

Exercise 3.39. Suppose we are given a e Z„, along with integers mi, . . . ,m r . 
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with each m , > 1. Let m := JJ. m,-. Also, for i = let m* := m/m,. 

Show how to compute (a m i, . . . , a mr ) using 0(len(r)^) multiplications in Z„, where 
l := len(m). Hint: divide and conquer. Note that if r = 0(len(£)), then using the 
previous exercise, we can solve this problem using just 0(1) multiplications. 

Exercise 3.40. Let k be a constant , positive integer. Suppose we are given 
a\,...,atk € Z„, along with non-negative integers cq , . . . , e/ ( , where len(e,) < £ for 
i = 1 ,k. Show how to compute the value aq 1 ■ ■ ■ a e , k , using at most l squarings 
and 0((/len(()) additional multiplications in Z„. Hint: develop a 2 r -ary version 
of the algorithm in Exercise 3.36. 


3.5 Faster integer arithmetic (*) 

The quadratic-time algorithms presented in §3.3 for integer multiplication and divi- 
sion are by no means the fastest possible. The next exercise develops a faster 
multiplication algorithm. 

Exercise 3.41. Suppose we have two positive integers a and b, each of length 
at most £, such that a = a\2 k + and b = b\2 k + bo, where 0 < oq < 2 k and 
0 < bo < 2 k . Then 

ab = a\b\2 2k + (tfo^i + ti\bo)2 k + 

Show how to compute the product ab in time 0(1), given the products aobo, a\b\, 
and (no - )(b() — b\ ). From this, design a recursive algorithm that computes ab 

in time (Note that log 2 3 » 1.58.) 

The algorithm in the previous exercise is also not the best possible. In fact, it is 
possible to multiply two integers of length at most t on a RAM in time 0(1), but 
we do not explore this any further for the moment (see §3.6). 

The following exercises explore the relationship between integer multiplication 
and related problems. We assume that we have an algorithm that multiplies two 
integers of length at most £ in time at most Mil). It is convenient (and reasonable) 
to assume that M is a well-behaved complexity function. By this, we mean that 
M maps positive integers to positive real numbers, such that for some constant 
Y > 1 , and all positive integers a and b, we have 

M(a + b ) ^ 

“ M(a ) + M(b) ~ 

Exercise 3.42. Show that if M is a well-behaved complexity function, then it is 
strictly increasing. 



70 


Computing with large integers 


Exercise 3.43. Show that if N(i) := is a non-decreasing function, and 

M (2t) / M (£) = 0(1), then M is a well-behaved complexity function. 

Exercise 3.44. Let a > 0, /? > 1, y > 0, A > 0 be real constants. Show that 

M(£) := at^ \en{£) y lenflenff)) 15 

is a well-behaved complexity function. 

Exercise 3.45. Show that given integers n > 1 and e > 1, we can compute n e in 
time 0(M(len(« e ))). 

Exercise 3.46. Give an algorithm for Exercise 3.26 whose running time is 
0(M(len(«)) len(/c)). Hint: divide and conquer. 

Exercise 3.47. In the previous exercise, suppose all the inputs n, have the same 
length, and that M (£) = a£ P, where a and /I are constants with a > 0 and p > 1. 
Show that your algorithm runs in time 0(M(len(«))). 

Exercise 3.48. We can represent a “floating point” number £ as a pair (a, e), 
where a and e are integers — the value of z is the rational number a2 e , and we 
call len(n) the precision of z . We say that z is a A: -bit approximation of a real 
number z if z has precision k and z = (1 + e)z for some |e| < 2~ k+l . Show 
that given positive integers b and /<, we can compute a /< - h i t approximation of 1/A 
in time 0(M{k)). Hint: using Newton iteration, show how to go from a r-bit 
approximation of 1/A to a (2t — 2)-bit approximation of 1/A, making use of just 
the high-order 0{t) bits of A, in time ()( M(t)). Newton iteration is a general 
method of iteratively approximating a root of an equation / (x) = 0 by starting 
with an initial approximation xo, and computing subsequent approximations by 
the formula x,+i = x, - /(x,)/ /'(x,), where /'(x) is the derivative of /(x). For 
this exercise, apply Newton iteration to the function /(x) = x -1 — A. 

Exercise 3.49. Using the result of the previous exercise, show that, given pos- 
itive integers a and A of bit length at most i, we can compute [a/ b\ and a mod A 
in time 0(M(£)). From this we see that, up to a constant factor, division with 
remainder is no harder than multiplication. 

Exercise 3.50. Using the result of the previous exercise, give an algorithm for 
Exercise 3.27 that runs in time ()( M(\cn(n)) len(/c)). Hint: divide and conquer. 

Exercise 3.51. Give an algorithm for Exercise 3.29 whose running time is 
0(M(len(n))). Hint: Newton iteration. 

Exercise 3.52. Suppose we have an algorithm that computes the square of an 
Gbit integer in time at most S(l), where S' is a well-behaved complexity function. 
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Show how to use this algorithm to compute the product of two arbitrary integers of 
length at most l in time 0(S(£)). 

Exercise 3.53. Give algorithms for Exercise 3.32 whose running times arc 
0(M(£) lcn(f )), where i := len(n). Hint: divide and conquer. 

3.6 Notes 

Shamir [89] shows how to factor an integer in polynomial time on a RAM, but 
where the numbers stored in the memory cells may have exponentially many 
bits. As there is no known polynomial-time factoring algorithm on any realistic 
machine, Shamir’s algorithm demonstrates the importance of restricting the sizes 
of numbers stored in the memory cells of our RAMs to keep our formal model 
realistic. 

The most practical implementations of algorithms for arithmetic on large inte- 
gers arc written in low-level “assembly language,” specific to a particular machine’s 
architecture (e.g., the GNU Multi-Precision library GMP, available at gmplib. 
org). Besides the general fact that such hand-crafted code is more efficient than 
that produced by a compiler, there is another, more important reason for using 
assembly language. A typical 32-bit machine often comes with instructions that 
allow one to compute the 64-bit product of two 32-bit integers, and similarly, 
instructions to divide a 64-bit integer by a 32-bit integer (obtaining both the quo- 
tient and remainder). However, high-level programming languages do not (as a 
rule) provide any access to these low-level instructions. Indeed, we suggested in 
§3.3 using a value for the base B of about half the word-size of the machine, in 
order to avoid overflow. However, if one codes in assembly language, one can 
take B to be much closer, or even equal, to the word-size of the machine. Since 
our basic algorithms for multiplication and division run in time quadratic in the 
number of base-J5 digits, the effect of doubling the bit-length of B is to decrease 
the running time of these algorithms by a factor of four. This effect, combined 
with the improvements one might typically expect from using assembly-language 
code, can easily lead to a five- to ten-fold decrease in the running time, compared 
to an implementation in a high-level language. This is, of course, a significant 
improvement for those interested in serious “number crunching.” 

The “classical,” quadratic-time algorithms presented here for integer multiplica- 
tion and division arc by no means the best possible: there arc algorithms that arc 
asymptotically faster. We saw this in the algorithm in Exercise 3.41, which was 
originally invented by Karatsuba [54] (although Karatsuba is one of two authors 
on this paper, the paper gives exclusive credit for this particular result to Karat- 
suba). That algorithm allows us to multiply two integers of length at most £ in time 
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0(7 log2 3 ). The fastest known algorithm for multiplying such integers on a RAM 
runs in time 0(£), and is due to Schonhage. It actually works on a very restricted 
type of RAM called a “pointer machine” (see Exercise 12, Section 4.3.3 of Knuth 
[56]). See Exercise 17.25 later in this text for a much simpler (but heuristic) 0(1) 
multiplication algorithm. 

Another model of computation is that of Boolean circuits. In this model of 
computation, one considers families of Boolean circuits (with, say, the usual “and,” 
“or,” and “not” gates) that compute a particular function — for every input length, 
there is a different circuit in the family that computes the function on inputs that 
are bit strings of that length. One natural notion of complexity for such circuit 
families is the size of the circuit (i.e., the number of gates and wires in the circuit), 
which is measured as a function of the input length. For many years, the smallest 
known Boolean circuit that multiplies two integers of length at most £ was of size 
0(£ lcn(C) len(len(())). This result was due to Schonhage and Strassen [86]. More 
recently, Fiirer showed how to reduce this to ()(£ lcn((')2° (log f > ) [38]. Here, the 
value of log* n is defined as the minimum number of applications of the function 
log 2 to the number n required to obtain a number that is less than or equal to 1. 
The function log* is an extremely slow growing function, and is a constant for all 
practical purposes. 

It is hai'd to say which model of computation, the RAM or circuits, is “better.” 
On the one hand, the RAM very naturally models computers as we know them 
today: one stores small numbers, like array indices, counters, and pointers, in 
individual words of the machine, and processing such a number typically takes 
a single “machine cycle.” On the other hand, the RAM model, as we formally 
defined it, invites a certain kind of “cheating,” as it allows one to stuff 0(len(£))- 
bit integers into memory cells. For example, even with the simple, quadratic-time 
algorithms for integer arithmetic discussed in §3.3, we can choose the base B to 
have len(£) bits, in which case these algorithms would run in time 0 ((£ / lcn(C)) 2 ). 
However, just to keep things simple, we have chosen to view B as a constant (from 
a formal, asymptotic point of view). 

In the remainder of this text, unless otherwise specified, we shall always use 
the classical 0 (£ 2 ) bounds for integer multiplication and division. These have the 
advantages of being simple and of being reasonably reliable predictors of actual 
performance for small to moderately sized inputs. For relatively large numbers, 
experience shows that the classical algorithms are definitely not the best — Karat- 
suba’s multiplication algorithm, and related algorithms for division, are superior 
on inputs of a thousand bits or so (the exact crossover depends on myriad imple- 
mentation details). The even “faster” algorithms discussed above are typically not 
interesting unless the numbers involved are truly huge, of bit length around 10 5 - 
10 6 . Thus, the reader should bear in mind that for serious computations involving 
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very large numbers, the faster algorithms arc very important, even though this text 
does not discuss them at great length. 

For a good survey of asymptotically fast algorithms for integer arithmetic, see 
Chapter 9 of Crandall and Pomerance [30], as well as Chapter 4 of Knuth [56]. 
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Euclid’s algorithm 


In this chapter, we discuss Euclid’s algorithm for computing greatest common 
divisors, which, as we will see, has applications far beyond that of just computing 
greatest common divisors. 


4.1 The basic Euclidean algorithm 

We consider the following problem: given two non-negative integers a and b , com- 
pute their greatest common divisor, gcd(a, b). We can do this using the well-known 
Euclidean algorithm, also called Euclid’s algorithm. 

The basic idea is the following. Without loss of generality, we may assume that 
a > b > 0. If b = 0, then there is nothing to do, since in this case, gcd(a, 0) = a. 
Otherwise, b > 0, and we can compute the integer quotient q := \_a/b\ and remain- 
der r := a mod b, where 0 < r < b. From the equation 

a = bq + r, 

it is easy to see that if an integer d divides both b and r, then it also divides a; like- 
wise, if an integer d divides a and b , then it also divides r. From this observation, it 
follows that gcd(a, b ) = gcd (b, r), and so by performing a division, we reduce the 
problem of computing gcd(a, b) to the “smaller” problem of computing gcd(Z>, r). 
The following theorem develops this idea further: 

Theorem 4.1. Let a , b be integers, with a > b > 0. Using the division with 

remainder property, define the integers ro,r\,. . . , r A+ \ and q\ q A , where A > 0, 

as follows: 
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a = /-(), 
b = r i, 

r<) = r\q\ + r 2 (0 < r 2 < n), 


fi-i = fiQi + n + 1 (0 < r i+ 1 < n), 

r A _ 2 = r X -\q A -\ + r A (0 < a < o_i), 

O-i = NW (O+i = 0). 

Note that by definition, A = 0 if b = 0, and A > 0, otherwise. Then we 
have r A = gcd(a, b). Moreover, if b > 0, then A < log b/ log </> + 1, where 
<p := (1 + V5)/2 » 1.62. 

Proof. For the first statement, one sees that for / = l,..., A, we have r,_i = 
I’iQi + r i+ 1 , from which it follows that the common divisors of r,-_i and r, arc the 
same as the common divisors of r, and r i+ \, and hence gcd(r,_i, r,) = gcd(r ( , r i+ \). 
From this, it follows that 

gcd (a,b) = gcd(r 0 ,ri) = • • • = gcd(r x , r M ) = gcd( r x ,0) = r A . 

To prove the second statement, assume that b > 0, and hence A > 0. If A = 1, the 

statement is obviously true, so assume A > 1. We claim that for / = 0 ,2-1, 

we have r,- l > cp l . The statement will then follow by setting i = A — 1 and taking 
logarithms. 

We now prove the above claim. For i = 0 and / = 1 , we have 

r , i > 1 = <p° and r A -\ >r A + l>2>cj) { . 

For i = 2 ,..., A - 1, using induction and applying the fact that cfr = 0 + 1, we 
have 

r A -i > r A - (i -]) + a_(,_ 2 , > (p‘~ l + 4>'~ 2 = (p'~ 2 ( 1 +(p) = 
which proves the claim. □ 

Example 4.1. Suppose a = 100 and b = 35. Then the numbers appealing in 
Theorem 4. 1 are easily computed as follows: 


i 

0 

1 

2 

3 

4 

r t 

100 

35 

30 

5 

0 

hi 


2 

1 

6 
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So we have gcd(n, b) = = 5. □ 

We can easily turn the scheme described in Theorem 4. 1 into a simple algorithm: 

Euclid’s algorithm. On input a, b , where a and b arc integers such that a > b > 0. 
compute d = gcd(a, b) as follows: 

r <— a, r' <— b 
while r' f 0 do 

r" <— r mod r' 

(r,r') <- (r',r") 
d <— r 
output cl 

We now consider the running time of Euclid's algorithm. Naively, one could 
estimate this as follows. Suppose a and b arc 7 -bit numbers. The number of 
divisions performed by the algorithm is the number A in Theorem 4.1, which is 
0{t). Moreover, each division involves numbers of t bits or fewer in length, and 
so takes time ()(( 2 ). This leads to a bound on the running time of 0(i 3 ). However, 
as the following theorem shows, this cubic running time bound is well off the mark. 
Intuitively, this is because the cost of performing a division depends on the length 
of the quotient: the larger the quotient, the more expensive the division, but also, 
the more progress the algorithm makes towards termination. 

Theorem 4.2. Euclid’s algorithm runs in time 0(len(n) len(/>)). 

Proof. We may assume that b > 0. With notation as in Theorem 4.1, the running 
time is 0{T), where 

x x 

T = ^ lenH,) len(< 7 ,) < len(6) ^ len(< 7 ,) 

7 = 1 7=1 

X 

< len (b) y^(len(r,_i) - len(r,) + 1) (see Exercise 3.24) 

7=1 

= len(6)(len(ro) - lenfr^) + A) (telescoping the sum) 

< len(Z>)(len(n) + log 6/ log <p + 1) (by Theorem 4.1) 

= 0(len(n) len(6)). □ 

Exercise 4.1. With notation as in Theorem 4.1, give a direct and simple proof 
that for each i = 1, . . . , A, we have r i+ \ < r^x/l. Thus, with every two division 
steps, the bit length of the remainder drops by at least 1. Based on this, give an 
alternative proof that the number of divisions is 0(len(/>)). 
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Exercise 4.2. Show how to compute lcm(a, b) in time 0(len(a) len(ft)). 

Exercise 4.3. Let a.b e Z with a > b > 0, let d := gcd (a,b), and assume 
d > 0. Suppose that on input a, b, Euclid’s algorithm performs A division steps, 
and computes the remainder sequence { r, } and the quotient sequence {<7/}f =1 
(as in Theorem 4.1). Now suppose we run Euclid’s algorithm on input a/d, b/d. 
Show that on these inputs, the number of division steps performed is also A, the 
remainder sequence is {rjd)^ , and the quotient sequence is { q, )'- =v 

Exercise 4.4. Show that if we run Euclid’s algorithm on input a, b , where a > 
b > 0, then its running time is 0(len(a/d) len(6)), where d := gcd(a, b ). 

Exercise 4.5. Let A be a positive integer. Show that there exist integers a, b with 
a > b > 0 and A > log b/ log <p , such that Euclid’s algorithm on input a, b performs 
at least A divisions. Thus, the bound in Theorem 4.1 on the number of divisions is 
essentially tight. 

Exercise 4.6. This exercise looks at an alternative algorithm for computing 
gcd(a, b), called the binary gcd algorithm. This algorithm avoids complex opera- 
tions, such as division and multiplication; instead, it relies only on subtraction, and 
division and multiplication by powers of 2, which, assuming a binary representa- 
tion of integers (as we are), can be very efficiently implemented using “right shift” 
and “left shift” operations. The algorithm takes positive integers a and b as input, 
and runs as follows: 

r <- a, r' «- b, e «- 0 

while 2 | r and 2 | r' do r <- r /2, r' *— r' / 2, e <— e + 1 
repeat 

while 2 | r do r •<— r/2 
while 2 | / do r' <- t-'/ 2 
if r' < r then (r, r') <- (r', r) 

/•' <— r' — r 
until / = 0 
d <r- 2 e ■ r 
output d 

Show that this algorithm correctly computes gcd(a. b). and runs in time 0(i 2 ), 
where l := max(len(a), len(A)). 


4.2 The extended Euclidean algorithm 

Let a and b be integers, and let d := gcd(a, b ). We know by Theorem 1.8 that there 
exist integers s and t such that as + bt = d. The extended Euclidean algorithm 
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allows us to efficiently compute s and t. The next theorem defines the quantities 
computed by this algorithm, and states a number of important facts about them; 
these facts will play a crucial role, both in the analysis of the running time of the 
algorithm, as well as in applications of the algorithm that we will discuss later. 

Theorem 4.3. Let a, b, ro , . . . , rx + 1 and q\, . . . , qx be as in Theorem 4.1. Define 
integers so, ■ . . , .sy+i and to, , tx + i as follows: 



So : 

= 1, 

to 0, 




si : 

= 0, 

fi := 1, 




Si + 1 : 

= Si- 

-1 — SjQj, t/+ 1 •— t/— 1 

~ ttqt 

(i = 1 ,...,A). 

Then: 






(0 

for i = 

0,.. 

. , A+ 1, we have asi+bt, = rg 

in particular, as^+btx = gcd(a, b); 

(n) 

for i = 

0... 

. , A, we have s,f, + i - t,s,+i = 

-- (-1)'; 


(Hi) 

for i = 

0... 

. , A + 1, we have gcd(s,, t, ) 

= 1; 


(iv) 

for i = 

0,. 

. , A, we have fffi+i < 0 and 

\tj\ < \t i+ i\; for i = 1 ,...,A,we 


have s 

Si+ 1 

< 0 and |s,| < |s /+1 |; 



(v) 

for i = 

1,.. 

. ,A + 1, we have r,-_ i |t, | < a 

and r,_i 

5, | < b ; 

(vi) 

if a > 

0, then for i = 1, . . . , A + 1, we have |t, | < 

a and |s,| < b: if a > 1 


and b > 0, then \tx\ < a/2 and Is^l < b/2. 


Proof, (i) is easily proved by induction on i. For i = 0,1, the statement is clear. 
For i = 2, . . . , A + 1 , we have 

asj + btj = a(Sj-2 - s t -\q t -\) + 2 - U-iQi-t) 

= fas j—2 + bti-2 ) - (asi-i + 

= r ,_ 2 - r,_ | q,- \ (by induction) 

= r i- 

(ii) is also easily proved by induction on For i = 0, the statement is clear. For 
i = 1, . . . , A, we have 

Sjti+i - tiSj+i = Sj(ti - 1 - tiqi) - tj(si - 1 - Stqi) 

= —(sj-itj — tj-\Sj) (after expanding and simplifying) 

= - ( ■ - 1 ) ' ~ 1 (by induction) 

= (-iy. 

(iii) follows directly from (ii). 

For (iv), one can easily prove both statements by induction on i. The state- 
ment involving the f,’s is clearly true for i = 0. For i = 1,...,A, we have 
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tj + 1 = tj-\ — tiqT, moreover, by the induction hypothesis, ?,_i and i, have opposite 
signs and |f,| > it follows that |r,+i| = |q_i| + \ti\q, > |q|, and that the sign 

of tj + 1 is the opposite of that of f,. The proof of the statement involving the s ,’ s is 
the same, except that we start the induction at i = 1. 

For (v), one considers the two equations: 

asj - 1 + bti - 1 = r,- 1 , 

AS; + = /*,. 

Subtracting f,_i times the second equation from ?, times the first, and applying 
(ii), we get ±a = r,r,_i - f,_j consequently, using the fact that ?, and f,_i have 
opposite sign, we obtain 

n = = |h|r,_i + |f,_i|r ( - > |0|r,_i. 

The inequality involving .v,- follows similarly, subtracting s,_i times the second 
equation from s, times the first. 

(vi) follows from (v) and the following observations: if a > 0, then r,_| > 0 for 
i = l,...,d+l;ifa> 1 and b > 0, then A > 0 and r^~ i > 2. □ 

Example 4.2. We continue with Example 4.1. The .v, ’s and r,’s arc easily computed 
from the q t ' s: 


i 

0 

1 

2 

3 

4 

n 

100 

35 

30 

5 

0 

Qi 


2 

1 

6 


Si 

1 

0 

1 

-1 

7 

u 

0 

1 

-2 

3 

-20 


So we have gcdfn, b) = 5 = -a + 3b. □ 

We can easily turn the scheme described in Theorem 4.3 into a simple algorithm: 

The extended Euclidean algorithm. On input a , b, where a and b are integers 
such that a > b > 0, compute integers d, s, and t, such that d = gcdfa, b) and 
as + bt = d, as follows: 

r a, r' <— b 
s < — 1 , s' < — 0 

while r' ^ 0 do 

q <r- [r/r'\, r" <— r mod r' 

(r, 5, t, r' , s', t') <r- (r\ s', t' , r" , s - s'q, t - t'q) 
d <— r 

output d, s, t 
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Theorem 4.4. The extended Euclidean algorithm runs in time 0(len(a) len(/>)). 


Proof. We may assume that b > 0. It suffices to analyze the cost of computing the 
coefficient sequences { s , } and { r, } . Consider first the cost of computing all of the 
i/’s, which is 0(T), where T = Xf=i len(f,) len(^,). We have t\ = 1 and, by paid 
(vi) of Theorem 4.3, we have |i, | < a for i = 2, . . . , A. Arguing as in the proof of 
Theorem 4.2, we have 

A 

T < len(gi) + len(a) ^ len(</,) 

i=2 

< len(a) + len(a)(len(ri) - lenfr^) + A - 1) = 0(len(n) len(Z>)). 


An analogous argument shows that one can also compute all of the .v/’s in time 
0(len(a) len(A)), and in fact, in time 0(len(6) 2 ). □ 


For the reader familial - with the basics of the theory of matrices and determinants, 
it is instructive to view Theorem 4.3 as follows. For i = 1, . . . , A, we have 



Recursively expanding the right-hand side of this equation, we have 


M, := 



This defines the 2x2 matrix M, for i = 1 A. If we additionally define Mq to 
be the 2 x 2 identity matrix, then it is easy to see that for / = 0 A, we have 


M, = 




From these observations, part (i) of Theorem 4.3 is immediate, and paid (ii) follows 
from the fact that M, is the product of i matrices, each of determinant -1, and the 
determinant of M, is evidently Sjt i+ \ — tjS i+ \. 


Exercise 4.7. In our description of the extended Euclidean algorithm, we made 
the restriction that the inputs a and b satisfy a > b > 0. Using this restricted 
algorithm as a subroutine, give an algorithm that works without any restrictions on 
its input. 

Exercise 4.8. With notation and assumptions as in Exercise 4.3, suppose that on 
input a, b , the extended Euclidean algorithm computes the coefficient sequences 
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{ Si } and { tj } (as in Theorem 4.3). Show that the extended Euclidean algo- 

rithm on input a/d,b/d computes the same coefficient sequences. 

Exercise 4.9. Assume notation as in Theorem 4.3. Show that: 

(a) for all i = 2, . . . , A, we have |i,| < |i,+i | and < a, and that for all 

i = 3, . . . , A, we have |s,| < |s,+i| and < b\ 

(b) srfj < 0 for i = 0 ,2+1; 

(c) if d := gcd(a, b ) > 0, then |si+i| = b/d and \tx+\\ = a/d. 

Exercise 4.10. One can extend the binary gcd algorithm discussed in Exer- 
cise 4.6 so that in addition to computing d = gcd(n, b), it also computes s and 
t such that as + bt = d. Here is one way to do this (again, we assume that a and b 
arc positive integers): 

r <- a, r' <— b, e «- 0 

while 2 | r and 2 | r' do r <- r /2, r' <— r' / 2, e <— e + 1 

a <— r, b <— r', s <- 1, t <- 0, s' <- 0, t' <- 1 
repeat 

while 2 | r do 
r <- r / 2 

if 2 | s and 2 | t then s <— s/2, t <— i/2 

else s *- (s + b)/2, t *- (t — a)/2 

while 2 | r' do 
r' <- r'/2 

if 2 | s' and 2 | i' then s' *- s' / 2, t' <— i'/2 

else s' <- ( s' + b)/2, t' <— (i' - a)/ 2 
if r' < r then (r, s, t, r', s', t') <- (/, s', t', r, s, t ) 

/•' <r- r' -r, s' < — s' — s, t’ <r- t’ -t 
until r’ = 0 

d 2 e ■ r, output d, s, t 

Show that this algorithm is correct and that its running time is 0((' 2 ), where 
£ := max(len(n), len(i>)). In particular, you should verify that all of the divisions 
by 2 performed by the algorithm yield integer results. Moreover, show that the 
outputs s and t are of length ()(£). 

Exercise 4.11. Suppose we modify the extended Euclidean algorithm so that it 
computes balanced remainders; that is, for i = 1, . . . , A, the values q t and r i+ \ arc 
computed so that r,_ i = + r i+ \ and — |r,-|/2 < r i+ \ < |r,-|/2. Assume that 

the s/s and the t/s arc computed by the same formula as in Theorem 4.3. Give 
a detailed analysis of the running time of this algorithm, which should include an 
analysis of the number of division steps, and the sizes of the s/s and t/s. 
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4.3 Computing modular inverses and Chinese remaindering 

An important application of the extended Euclidean algorithm is to the problem of 
computing multiplicative inverses in Z„. 

Theorem 4.5. Suppose we are given integers n, b, where 0 < b < n. Then in 
time C)(\cn(n) 2 ), we can determine if b is relatively prime to n, and if so, compute 
b~ l mod n. 

Proof. We may assume n > 1, since when n = 1, we have b = 0 = b~ { mod n. We 
run the extended Euclidean algorithm on input n, b, obtaining integers d, s, and t, 
such that d = gcd(n, b) and ns + bt = d. If d f 1, then b does not have a multi- 
plicative inverse modulo n. Otherwise, if d = 1, then t is a multiplicative inverse 
of b modulo n: however, it may not lie in the range { 0, 1 }, as required. By 
part (vi) of Theorem 4.3, we have |t| < n/2 < n. Thus, if t > 0, then b~ l mod n is 
equal to t\ otherwise, b~ l mod n is equal to t + n. Based on Theorem 4.4, it is clear 
that all the computations can be performed in time 0(len(n) 2 ). □ 

Example 4.3. Suppose we are given integers a, b, n, where 0 < a < n, and 
0 < b < n, and we want to compute a solution to the congruence az = b (mod n), 
or determine that no such solution exists. Based on the discussion in Example 2.5, 
the following algorithm does the job: 

d gcd(a,n) 
if d \ b then 

output “no solution” 

else 

a' <— a/d, b' <— b/d, n' ■*— n/d 
t <r- (a') -1 mod n' 
z *— tb' mod n' 
output z 

Using Euclid’s algorithm to compute d, and the extended Euclidean algorithm 
to compute t (as in Theorem 4.5), the running time of this algorithm is clearly 
0(len(«) 2 ). □ 

We also observe that the Chinese remainder theorem (Theorem 2.6) can be made 
computationally effective: 

Theorem 4.6 (Effective Chinese remainder theorem). Suppose we are given 
integers n \, . . . , «/ c and a\, , cq, where the family is pairwise relatively 

prime, and where «, > 1 and 0 < a, < n, for i = 1 ,... ,k. Let n := nf=i n i- Then 
in time 0(len(n) 2 ), we can compute the unique integer a satisfying 0 < a < n and 
a : a ; (mod «,•) for i = 1, .... k. 
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Proof. The algorithm is a straightforward implementation of the proof of Theo- 
rem 2.6, and runs as follows: 

n Till «/ 

for / «- I to k do 

n* <— n/rii , b t <— n* mod «- b~ l mod n e,- <— n*tj 
a (Xti a ( e <) mod n 

We leave it to the reader to verify the running time bound. □ 

Exercise 4.12. In Example 4.3, show that one can easily obtain the quantities 
d, a', n\ and t from the data computed in just a single execution of the extended 
Euclidean algorithm. 

Exercise 4.13. In this exercise, you arc to make the result of Theorem 2.17 
effective. Suppose that we arc given a positive integer «, two elements a. ji e Z*, 
and integers i and m, such that a 1 = j3 m and gcd((, m) = 1. Show how to compute 
Y e Z* such that a = y m in time 0(len(f ) len (m) + (len(T) + len(m)) len(n) 2 ). 

Exercise 4.14. In this exercise and the next, you arc to analyze an “incremental 
Chinese remaindering algorithm.” Consider the following algorithm, which takes 
as input integers ci\,n\,ai,n 2 satisfying 

0 < a\ < «i, 0 < Gb < ni, and gcd(ni,n 2 ) = 1. 

It outputs integers a , n satisfying 

n = n^ni, 0 < a < n, a = a\ (mod n\), and a = ai (mod m), 

and runs as follows: 

b <- n\ mod nj, t <- b~ l mod ri 2 , h (02 — a\)t mod n 2 
a <— a\+ n\h, n <— n\ti 2 
output a , n 

Show that the algorithm correctly computes a and n as specified, and runs in time 
0(len(«) len(« 2 ))- 

Exercise 4.15. Using the algorithm in the previous exercise as a subroutine, give 
a simple 0(len(«) 2 ) algorithm that takes as input integers n\,...,rik and a\,...,ak, 
where the family { n t } ^ , is pairwise relatively prime, and where «, > 1 and 
0 < cij < Hj for i = 1 k, and outputs integers a and n such that 0 < a < n, 
n = an< i a = a i ( m °d n i) f° r i = 1 The algorithm should 

be “incremental,” in that it processes the pairs («,,«,) one at a time, using time 
0(len(«) len(«/)) per pair. 



84 


Euclid’s algorithm 


Exercise 4.16. Suppose we are given ai, . . . , a* £ Z*. Show how to compute 
aj -1 , . . . , cqT 1 by computing one multiplicative inverse modulo n, and performing 
fewer than 3k multiplications modulo n. This result is useful, as in practice, if n is 
several hundred bits long, it may take 10-20 times longer to compute multiplicative 
inverses modulo n than to multiply modulo n. 


4.4 Speeding up algorithms via modular computation 

An important practical application of the above “computational” version (Theo- 
rem 4.6) of the Chinese remainder theorem is a general algorithmic technique that 
can significantly speed up certain types of computations involving long integers. 
Instead of trying to describe the technique in some general form, we simply illus- 
trate the technique by means of a specific example: integer matrix multiplication. 

Suppose we have two m x m matrices A and B whose entries are large integers, 
and we want to compute the product matrix C := AB. Suppose that for r,s = 
1, . . . , m, the entry of A at row r and column s is a rs , and that for s, t = 1, . . . , m, 
the entry of B at row s and column t is b st . Then for r, t = 1 the entry of C at 
row r and column t is c rt , which is given by the usual rule for matrix multiplication: 

m 

Crt — ®rsbst- (4.1) 

5=1 

Suppose further that M is the maximum absolute value of the entries in A and 
B, so that the entries in C arc bounded in absolute value by M' := M 2 m. Let 
£ := len(M). To simplify calculations, let us also assume that m < M (this is 
reasonable, as we want to consider large values of M, greater than say 2 100 , and 
certainly, we cannot expect to work with 2 100 x 2 100 matrices). 

By just applying the formula (4.1), we can compute the entries of C using m 3 
multiplications of numbers of length at most £, and m 3 additions of numbers of 
length at most len(M'), where len(M') < 2£ + len(m) = 0(£). This yields a 
running time of 

0(m 3 £ 2 ). (4.2) 

Using the Chinese remainder theorem, we can actually do much better than this, as 
follows. 

For every integer n > 1, and for all r, t = 1, . . . , m, we have 

m 

c rt = ^ a rs b st (mod n). 

5 = 1 


(4.3) 
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Moreover, if we compute integers c' rt such that 

m 

c' rt = ^ Qrsbst (mod n ) (4.4) 

s= 1 

and if we also have 

— n/2 < c' rt < n/2 and n > 2M' , (4.5) 

then we must have 

c r t = c' rt . (4.6) 

To see why (4.6) follows from (4.4) and (4.5), observe that (4.3) and (4.4) imply 
that c rt = c' rt (mod n), which means that n divides ( c rI — c' n ). Then from the bound 
\c r t\ < M' and from (4.5), we obtain 

| c r , - c ' rt | < \c r f | + |4| < M' + n/2 < n/2 + n/2 = n. 

So we see that the quantity ( c r , — c' rt ) is a multiple of n, while at the same time this 
quantity is strictly less than n in absolute value; hence, this quantity must be zero. 
That proves (4.6). 

So from the above discussion, to compute C, it suffices to compute the entries 
of C modulo n, where we have to make sure that we compute “balanced” remain- 
ders in the interval [-n/2, n/2), rather than the more usual “least non-negative” 
remainders. 

To compute C modulo n, we choose a number of small integers ni, . . . , n^, such 
that the family (n i } l / =l is pairwise relatively prime, and the product n := JJ*_j n, 
is just a bit larger than 2 M’ . In practice, one would choose the n,’s to be small 
primes, and a table of such primes could easily be computed in advance, so that 
all problems up to a given size could be handled. For example, the product of all 
primes of at most 16 bits is a number that has more than 90,000 bits. Thus, by 
simply pre-computing and storing a table of small primes, we can handle input 
matrices with quite large entries (up to about 45,000 bits). 

Let us assume that we have pre-computed appropriate small primes ni, . . . , n*. 
Further, we shall assume that addition and multiplication modulo each «, can be 
done in constant time. This is reasonable from a practical (and theoretical) point 
of view, since such primes easily “fit” into a machine word, and we can perform 
modular addition and multiplication using a constant number of built-in machine 
operations. Finally, we assume that we do not use more n/s than are necessary, so 
that len(n) = 0(1) and k = 0(1). 

To compute C, we execute the following steps: 
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1. For each i = 1, . . . , k, do the following: 

(a) compute a?] «- a rs mod n, for r, s = 1 ,m, 

(b) compute b st <— b st mod n , for s, t = 1 , . . . , m, 

(c) for r,t = 1 ,m, compute 

m 

40 v -(040 1 

c rt <- 2ja rs b st mod n, . 

S— 1 

2. For each /% t = 1, . . . , m, apply the Chinese remainder theorem to c^\ c^\ 
. . . , c rt , obtaining an integer c rt , which should be computed as a balanced 
remainder modulo n, so that —n/2 < c rt < n/2. 

3. Output the matrix C, whose entry in row r and column t is c rt . 

Note that in step 2, if our Chinese remainder algorithm happens to be imple- 
mented to return an integer a with 0 < a < n, we can easily get a balanced 
remainder by just subtracting n from a if a > n/2. 

The correctness of the above algorithm has already been established. Let us now 
analyze its running time. The running time of steps la and lb is easily seen to be 
0(m 2 t 2 ). Under our assumption about the cost of arithmetic modulo small primes, 
the cost of step lc is 0(m 3 k), and since k = 0(£), the cost of this step is 0(m 3 £). 
Finally, by Theorem 4.6, the cost of step 2 is 0(m 2 l 2 ). Thus, the total running time 
of this algorithm is 

0(m 2 i 2 + m 3 l). 

This is a significant improvement over (4.2); for example, if £ x m, then the run- 
ning time of the original algorithm is 0(m 5 ), while the running time of the modular 
algorithm is 0(m 4 ). 


Exercise 4.17. Apply the ideas above to the problem of computing the product 
of two polynomials whose coefficients are large integers. First, determine the run- 
ning time of the “obvious” algorithm for multiplying two such polynomials, then 
design and analyze a “modular” algorithm. 


4.5 An effective version of Fermat’s two squares theorem 

We proved in Theorem 2.34 (in §2.8.4) that every prime p = 1 (mod 4) can be 
expressed as a sum of two squares of integers. In this section, we make this theorem 
computationally effective; that is, we develop an efficient algorithm that takes as 
input a prime p = 1 (mod 4), and outputs integers r and t such that p = r 2 + t 2 . 
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One essential ingredient in the proof of Theorem 2.34 was Thue’s lemma (The- 
orem 2.33). This lemma asserts the existence of certain numbers, and we proved 
it using the “pigeonhole principle,” which unfortunately does not translate directly 
into an efficient algorithm to actually find these numbers. However, we can show 
that these numbers arise as a “natural by-product” of the extended Euclidean algo- 
rithm. To make this more precise, let us introduce some notation. For integers a, b, 
with a > b > 0, let us define 

EEA (a,b) := {(r,, J, 

where r h s h and t h for i = 0 ,2+1, arc defined as in Theorem 4.3. 

Theorem 4.7 (Effective Thue’s lemma). Let n, b. r*, t* e Z, with 0 < b < n 
and 0 < r* < n < r*t*. Further, let EEA (n,b) = {(/*,■, f,)} 2 ^ 1 , and let j he the 

smallest index (among 0 , A + 1) such that rj < r*. Then, setting r := rj and 

t := tj, we have 

r = bt (mod n), 0 < r < r* , and 0 < |t| < t*. 

Proof. Since ro = n > r* > 0 = r^+ i, the value of the index j is well defined; 
moreover, j > I and r y _i > r*. It follows that 

| tj | < n/rj- 1 (by paid (v) of Theorem 4.3) 

< n/r* 

< t* (since n < r*t*). 

Since j > 1, by paid (iv) of Theorem 4.3, we have \tf > |fi| > 0. Finally, since 
rj = nsj + btj, we have rj = btj (mod n). □ 

What this theorem says is that given n, b, r*,t*, to find the desired values r and t, 
we run the extended Euclidean algorithm on input n , b. This generates a sequence 
of remainders ro > r\ > rj > ■ ■ • , where r^ = n and r\ = b. If r } is the first 
remainder in this sequence that falls below r* , and if Sj and tj arc the corresponding 
numbers computed by the extended Euclidean algorithm, then r := r y and t := tj 
do the job. 

The other essential ingredient in the proof of Theorem 2.34 was Theorem 2.31, 
which guarantees the existence of a square root of -1 modulo p when p is a prime 
congruent to 1 modulo 4. We need an effective version of this result as well. Later, 
in Chapter 12, we will study the general problem of computing square roots modulo 
primes. Right now, we develop an algorithm for this special case. 

Assume we arc given a prime p = 1 (mod 4), and we want to compute ft e Z* 
such that f 2 = -1. By Theorem 2.32, it suffices to find ye Z* \ (Z*) 2 , since then 
P := (which we can efficiently compute via repeated squaring) satisfies 
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jf = - 1 . While there is no known efficient, deterministic algorithm to find such 
a y , we do know that half the elements of Z* arc squares and half arc not (see 
Theorem 2.20), which suggests the following simple “trial and error” strategy to 
compute /?: 

repeat 

choose y 6 Z* 
compute p <r- yd* -1 )/ 4 
until p 2 = -1 
output P 

As an algorithm, this is not fully specified, as we have to specify a procedure 
for selecting y in each loop iteration. A reasonable approach is to simply choose 
Y at random', this would be an example of a probabilistic algorithm , a notion that 
we will study in detail in Chapter 9. Let us assume for the moment that this makes 
sense from a mathematical and algorithmic point of view, so that with each loop 
iteration, we have a 50% chance of picking a “good” y, that is, one that is not in 
(Z*) 2 . From this, it follows that with high probability, we should find a “good” 
y in just a few loop iterations (the probability that after k loop iterations we still 
have not found one is 1 /2 k ), and that the expected number of loop iterations is just 

2. The running time of each loop iteration is dominated by the cost of repeated 
squaring, which is 0(len(p) 3 ). It follows that the expected running time of this 
algorithm (we will make this notion precise in Chapter 9) is 0(len(p) 3 ). 

Let us now put all the ingredients together to get an algorithm to find r, t such 
that p = r 2 + t 2 . 

1. Find P e Z* such that p 2 = — 1, using the above “trial and error” strategy. 

2. Set b <- rep (/?) (so that p = [b\ and be {0 p - 1}). 

3. Run the extended Euclidean algorithm on input p, b to obtain EEA( p, b), 
and then apply Theorem 4.7 with n := p, b, and r* := t* := l \fp\ + 1, to 
obtain the values r and t. 

4. Output r, t. 

When this algorithm terminates, we have r 2 + t 2 = p, as required: as we argued 
in the proof of Theorem 2.34, since r = bt (mod p) and b 2 = — 1 (mod p), it 
follows that r 2 + r = 0 (mod p), and since 0 < r 2 + t 2 < 2 p, we must have 
r 2 + t 2 = p. The (expected) running time of step 1 is 0(len(p) 3 ). The running 
time of step 3 is 0(len(p) 2 ) (note that we can compute [\/pJ i n time 0(len(p) 2 ), 
using the algorithm in Exercise 3.29). Thus, the total (expected) running time is 
0(len(p) 3 ). 

Example 4.4. One can check that p := 1009 is prime and p = I (mod 4). Let us 
express p as a sum of squares using the above algorithm. First, we need to find a 
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square root of -1 modulo p. Let us just try a random number, say 17, and raise this 
to the power (p - l)/4 = 252. One can calculate that 17 252 = 469 (mod 1009), 
and 469 2 = -1 (mod 1009). So we were lucky with our first try. Now we run 
the extended Euclidean algorithm on input p = 1009 and b = 469, obtaining the 
following data: 


i 

n 

Qi 

Si 

ti 

0 

1009 


1 

0 

1 

469 

2 

0 

1 

2 

71 

6 

1 

-2 

3 

43 

1 

-6 

13 

4 

28 

1 

7 

-15 

5 

15 

1 

-13 

28 

6 

13 

1 

20 

-43 

7 

2 

6 

-33 

71 

8 

1 

2 

218 

-469 

9 

0 


-469 

1009 


The first rj that falls below the threshold r* = [ V 1009J + I =32 is at j = 4, and so 
we set r := 28 and t := -15. One verifies that r 2 + t 2 = 28 2 + 15 2 = 1009 = p. □ 

It is natural to ask whether one can solve this problem without resorting to ran- 
domization. The answer is “yes” (see §4.8), but the only known deterministic 
algorithms for this problem arc quite impractical (albeit polynomial time). This 
example illustrates the utility of randomization as an algorithm design technique, 
one that has proved to be invaluable in solving numerous algorithmic problems 
in number theory; indeed, in §3.4 we already mentioned its use in connection with 
primality testing, and we will explore many other applications as well (after putting 
the notion of a probabilistic algorithm on firm mathematical ground in Chapter 9). 


4.6 Rational reconstruction and applications 

In the previous section, we saw how to apply the extended Euclidean algorithm 
to obtain an effective version of Thue’s lemma. This lemma asserts that for given 
integers n and b, there exists a pair of integers (r, t) satisfying r = bt (mod n), 
and contained in a prescribed rectangle, provided the area of the rectangle is large 
enough, relative to n. In this section, we first prove a corresponding uniqueness the- 
orem, under the assumption that the area of the rectangle is not too large; of course, 
if r = bt (mod n), then for any non-zero integer q, we also have rq = b(tq) (mod n), 
and so we can only hope to guarantee that the ratio r /t is unique. After proving this 
uniqueness theorem, we show how to make this theorem computationally effective, 
and then develop several very neat applications. 
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The basic uniqueness statement is as follows: 


Theorem 4.8. Let n,b,r*,t* e Z with r* > 0, t* > 0, and n > 2r*t*. Further, 
suppose that r, t, r' , t' E Z satisfy 

r = bt (mod n), |r | < r* , 0 < |f| < t* , (4.7) 

r' = bt' (mod n), \r'\ <r*, 0 < \t'\ < t*. (4.8) 

Then r /t = r' /t' . 

Proof. Consider the two congruences 

r = bt (mod n), 
r' = bt' (mod n). 

Subtracting f times the second from t' times the first, we obtain 

rt' — r't = 0 (mod n). 

However, we also have 

\rt' — r't\ < \r\\t'\ + |r'||t| < 2 r*t* < n. 

Thus, rt'—r't is a multiple of n, but less than n in absolute value; the only possibility 
is that rt' — r't = 0, which means r /t = r' /t' . □ 

Now suppose that we arc given n, b,r*,t* 6 Z as in the above theorem; more- 
over, suppose that there exist r,teZ satisfying (4.7), but that these values are not 
given to us. Note that under the hypothesis of Theorem 4.8, Thue’s le mm a cannot 
be used to ensure the existence of such r and t, but in our eventual applications, 
we will have other reasons that will guarantee this. We would like to find r' , t' e Z 
satisfying (4.8), and if we do this, then by the theorem, we know that r /t = r'/r'. 
We call this the rational reconstruction problem. We can solve this problem 
efficiently using the extended Euclidean algorithm; indeed, just as in the case of 
our effective version of Thue’s lemma, the desired values of r' and t' appeal - as 
“natural by-products” of that algorithm. To state the result precisely, let us recall 
the notation we introduced in the last section: for integers a, b, with a > b > 0, we 
defined 

EEA (a,b) := {(ly, s h 

where and r,, for i = 0 ,4+1, are defined as in Theorem 4.3. 

Theorem 4.9 (Rational reconstruction). Let n, b, r*, t* e Z with 0 < b < n, 

0 < r* < n, and t* > 0. Further, let EEA(n, b ) = {(r,, s ( , h)}^*, and let j he the 
smallest index (among 0 ,4+1) such that rj < r* , and set 

r' := rj, s' := Sj, and t' := tj. 
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Finally, suppose that there exist r, s,t e Z such that 

r = ns + bt, \r\ < r* , and 0 < |f| < t*. 

Then we have: 

(i) 0 < \t'\ < f; 

(ii) if n > 2 r*t*, then for some non-zero integer q, 

r = r'q, s = s' q, and t = t'q. 

Proof. Since ro = n > r* > 0 = r A+ \ . the value of j is well defined, and moreover, 
j > 1 , and we have the inequalities 

0 < rj < r* < rj-u 0 < \tj\, \r\ < r*, and 0 < |r| < f, (4.9) 

along with the identities 

rj~i = nsj-i + btj-i, (4.10) 

rj = nsj + btj, (4.11) 

r = ns + bt. (4.12) 

We now turn to paid (i) of the theorem. Our goal is to prove that 

\tj\ < t*. (4.13) 

This is the hardest part of the proof. To this end, let 

£ := sjtj- 1 - sj-itj, p := (tj-is - Sj-\t)/e, v := (sjt - tjs)/e. 

Since t = ±1, the numbers p and v are integers; moreover, one may easily verify 
that they satisfy the equations 

Sjp + Sj-] v = s, (4.14) 

tjP + tj-iv = t. (4.15) 

We now use these identities to prove (4.13). We consider three cases: 

(i) Suppose v = 0. In this case, (4.15) implies tj \ t, and since t f 0, this 
implies \tf < |f| < t*. 

(ii) Suppose pv < 0. In this case, since tj and tj- \ have opposite sign, (4.15) 
implies |f| = \tjp\ + |t 7 _iv| > \tj\, and so again, we have \tj\ < |r| < t*. 

(iii) The only remaining possibility is that v ^ 0 and pv > 0. We argue that 
this is impossible. Adding n times (4.14) to b times (4.15), and using the 
identities (4.10), (4.11), and (4.12), we obtain 


rjp + fj - 1 v = r. 
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If v ^ 0 and ji and v had the same sign, we would have \r\ = | // 1 + 1 r y _ i v| > 
rj-i, and hence 1 < \r\ < r*: however, this contradicts the fact that 
O-i > r *- 

That proves the inequality (4.13). We now turn to the proof of part (ii) of the 
theorem, which relies critically on this inequality. Assume that 

n>2r*t*. (4.16) 

From (4.11) and (4.12), we have 

i-j = btj (mod n) and r = bt (mod n). 

Combining this with the inequalities (4.9), (4.13), and (4.16), we see that the 
hypotheses of Theorem 4.8 arc satisfied, and so we may conclude that 

rtj — rjt = 0. (4.17) 

Subtracting tj times (4.12) from t times (4.1 1), and using the identity (4.17), we 
obtain n(stj — Sjt ) = 0, and hence 

stj — sjt = 0. (4.18) 

From (4.18), we see that tj \ Sjt, and since gcd (Sj, tj) = 1, we must have tj \ t. So 
t = tjq for some q, and we must have q f 0 since t f 0. Substituting tjq for t in 
equations (4.17) and (4.18) yields r = rjq and s = Sjq. That proves paid (ii) of the 
theorem. □ 

In our applications in this text, we shall only directly use paid (ii) of this theorem; 
however, part (i) has applications as well (see Exercise 4.18). 


4.6.1 Application: recovering fractions from their decimal expansions 
It should be a familial - fact to the reader that every real number has a decimal 
expansion, and that this decimal expansion is unique, provided one rules out those 
expansions that end in an infinite run of 9’s (e.g., 1/10 = 0. 1000 • • • = 0.0999 • • ■)• 
Now suppose that Alice and Bob play a game. Alice thinks of a rational number 
Z := s/t, where s and t are integers with 0 < s < r, and tells Bob some of the high- 
order digits in the decimal expansion of z- Bob’s goal in the game is to determine 
Z. Can he do this? 

The answer is “yes,” provided Bob knows an upper bound M on t, and provided 
Alice gives Bob enough digits. Of course, Bob probably remembers from grade 
school that the decimal expansion of z is ultimately periodic, and that given enough 
digits of z so that the periodic part is included, he can recover z\ however, this 
technique is quite useless in practice, as the length of the period can be huge — 
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0(M) in the worst case (see Exercises 4.21-4.23 below). The method we discuss 
here requires only 0(len(M)) digits. 

Suppose Alice gives Bob the high-order k digits of z, for some k > 1. That is, if 

z = 0. Z 1 Z 2 Z 3 • ■ • (4.19) 

is the decimal expansion of z, then Alice gives Bob z\,..-,Zk- Now, if 10* is 
much smaller than M 2 , the number z is not even uniquely determined by these 
digits, since there are Q( M 2 ) distinct rational numbers of the form s/t, with 
0 < s < t < M (see Exercise 1.33). However, if 10* > 2 M 2 , then not only 
is z uniquely determined by zi, ■ ■ • , Zk, but using Theorem 4.9, Bob can efficiently 
compute it. 

We shall presently describe efficient algorithms for both Alice and Bob, but 
before doing so, we make a few general observations about the decimal expansion 
of z- Let e be an arbitrary non-negative integer, and suppose that the decimal 
expansion of z is as in (4.19). Observe that 

10 e Z = Zl ■ ■ ■ Ze-Ze+lZe+2 

It follows that 

[I0 e z\ =Z1---Ze.0. (4.20) 

Since z = s/t, if we set r := 10 e s mod t , then 10 e 5 = L10 e zjf + r, and dividing 
this by t, we have 10 e z = LIO^J + r /t, where r//e [0, 1). Therefore, 

10 e s mod t 

= 0 . Ze+lZe+2Ze+3 ■ ■ ■ ■ (4.21) 

Next, consider Alice. Based on the above discussion, Alice may use the follow- 
ing simple, iterative algorithm to compute z\ ,Zk, for arbitrary k > 1, after she 
chooses s and t: 

X\ <- s 

for i <r- 1 to k do 
yi <- 1 Oxj 
zt <- Vyi/t\ 
x i+ 1 «- y, mod t 
output Zl,...,Zk 

CoiTectness follows easily from the observation that for each i = 1,2, ..., we 
have Xj = 10 !-1 s mod t\ indeed, applying (4.21) with e = i — 1, we have xjt = 
0 . ZiZi+iZi+i • • ■ , and consequently, by (4.20) with e = 1 and xjt in the role of z, 
we have [10x,/tJ = Zi ■ The total time for Alice’s computation is ()(k lcn( M )), 
since each loop iteration takes time 0(len(M)). 
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Finally, consider Bob. Given the high-order digits zu ■ ■ • > Ik of Z = s/t, along 
with the upper bound M on t, he can compute ^ as follows: 

1. Compute n <- 10^ and b <- Z ( 10 fc_i . 

2. Run the extended Euclidean algorithm on input n, b to obtain EEA(n, b ), 
and then apply Theorem 4.9 with n, b , and r* := t* := M, to obtain the 
values r', s', t' . 

3. Output the rational number —s'/t'. 

Let us analyze this algorithm, assuming that 1 0 k > 2 M 2 . 

For correctness, we must show that z = —s'/t'. To prove this, observe that by 
(4.20) with e = k, we have b = = [_ ns / 1\ . Moreover, if we set r := ns mod t, 

then we have 

r = ns - bt, 0 < r < t < r * , 0 < t < t* , and n > 2 r*t*. 

It follows that the integers s', t' from Theorem 4.9 satisfy s = s'q and —t = t'q 
for some non-zero integer q. Thus, s/t = —s'/t' , as required. As a bonus, since 
the extended Euclidean algorithm guarantees that gcd(s', t') = I , not only do we 
obtain z, but we obtain z expressed as a fraction in lowest terms. 

We leave it to the reader to verify that Bob’s computation may be performed in 
time 0(/c 2 ). 

We conclude that both Alice and Bob can successfully play this game with 
k chosen so that k = 0(len(M)), in which case, their algorithms run in time 
0(len(M) 2 ). 

Example 4.5. Alice chooses integers s, t, with 0 < s < t < 1000, and tells 
Bob the high-order seven digits in the decimal expansion of z '■= s/t, from 
which Bob should be able to compute z. Suppose s = 511 and t = 710. Then 
s/t = 0.7197183098591549 • • • . Bob receives the digits 7, 1, 9, 7, 1, 8, 3, and com- 
putes n = 10 7 and b = 7197183. Running the extended Euclidean algorithm on 
input n, b. Bob obtains the data in Fig. 4.1. The first rj that meets the threshold 
r* = 1000 is at j = 10, and Bob reads off s' = 511 and t’ = -710, from which he 
obtains z = —s'/t' = 511/710. 

Another interesting phenomenon to observe in Fig. 4. 1 is that the fractions —s, / 1 , 
arc very good approximations to the fraction b/n = 7197183/10000000; indeed, 
if we compute the error terms b/n + Si/t, for i = 1, . . . , 5, we get (approximately) 

0.72, -0.28, 0.053, -0.03, 0.0054. 

Thus, we can approximate the “complicated” fraction 7197183/10000000 by the 
“very simple” fraction 5 /7, introducing an absolute error of less than 0.006. Exer- 
cise 4.18 explores this “data compression” capability of Euclid’s algorithm in more 
generality. □ 
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Fig. 4.1. Bob’s data from the extended Euclidean algorithm 


4.6.2 Application: Chinese remaindering with errors 
One interpretation of the Chinese remainder theorem is that if we “encode” an 
integer a, with 0 < a < n, as the sequence (cq, . . . , a*), where a ; = a mod n, for 
i = 1, . . . , k, then we can efficiently recover a from this encoding. Here, of course, 
n = n\ ■ ■ ■ nk, and the family {«;}f =1 is pairwise relatively prime. 

Suppose that Alice encodes a as (oq, . . . , at), and sends this encoding to Bob 
over some communication network; however, because the network is not perfect, 
during the transmission of the encoding, some (but hopefully not too many) of 

the values a\ may be corrupted. The question is, can Bob still efficiently 

recover the original a from its corrupted encoding? 

To make the problem more precise, suppose that the original, correct encod- 
ing of a is (oq, . . ., a k), and the corrupted encoding is (b\, . . . , bk). Let us define 
G C { 1, . . . , k} to be the set of “good” positions i with a, = bj, and B C { 1, . . . , k } 
to be the set of “bad” positions i with a, bj. We shall assume that B\ < l, where 
l is some specified parameter. 

Of course, if Bob hopes to recover a , we need to build some redundancy into 
the system; that is, we must require that 0 < a < M for some bound M that is 
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somewhat smaller than n. Now, if Bob knew the location of bad positions, and if 
the product of the n,’s at the good positions exceeds M, then Bob could simply 
discard the errors, and reconstruct a by applying the Chinese remainder theorem to 
the a,-’s and n,’s at the good positions. However, in general. Bob will not know a 
priori the locations of the bad positions, and so this approach will not work. 

Despite these apparent difficulties. Theorem 4.9 may be used to solve the prob- 
lem quite easily, as follows. Let P be an upper bound on the product of any £ of the 
integers n\,...,rik (e.g., we could take P to be the product of the £ largest numbers 
among n\ n &). Further, let us assume that n > IMP 2 . 

Now, suppose Bob obtains the corrupted encoding (b \, . . . , bg). Here is what 
Bob does to recover a: 

1. Apply the Chinese remainder theorem, obtaining the integer b satisfying 
0 < b < n and b = b t (mod «,) for i = 1 ,k. 

2. Run the extended Euclidean algorithm on input n, b to obtain EEA(n, b ), 
and then apply Theorem 4.9 with n, b , r* := MP and t* := P, to obtain 
values r', s', t' . 

3. If t! | r' , output the integer r'/r'; otherwise, output “error.” 

We claim that the above procedure outputs a, under our assumption that the set B 
of bad positions is of size at most l. To see this, let t := n i- By construction, 
we have 1 < t < P. Also, let r := at, and note that 0 < r < r* and 0 < t < t*. We 
claim that 


r = bt (mod n). 

(4.22) 

To show that (4.22) holds, it suffices to show that 

at = bt (mod n,) 

(4.23) 


for all i = 1, . . . , k. To show this, for each index i we consider two cases: 
Case 1: i e G. In this case, we have a, = b,, and therefore. 


at = ad = bit = bt (mod «,). 
Case 2: i e B. In this case, we have «, | t, and therefore, 

at = 0 = bt (mod «,). 


Thus, (4.23) holds for all / = 1 ,... ,k, and so it follows that (4.22) holds. There- 
fore, the values r' , t' obtained from Theorem 4.9 satisfy 


/ 


t 


r 

t 


at 

— = a. 
t 


r 

f 
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One easily checks that both the procedures to encode and decode a value a run in 
time 0(len(n) 2 ). 

The above scheme is an example of an error correcting code, and is actually 
the integer analog of a Reed-Solomon code. 

Example 4.6. Suppose we want to encode a 1024-bit message as a sequence of 16- 
bit blocks, so that the above scheme can correct up to 3 corrupted blocks. Without 
any error correction, we would need just 1024/16 = 64 blocks. However, to correct 
this many errors, we need a few extra blocks; in fact, 7 will do. 

Of course, a 1024-bit message can naturally be viewed as an integer a in the 

set {0 2 1024 - 1}, and the z'th 16-bit block in the encoding can be viewed as 

an integer a, in the set {0 2 16 - 1}. Setting k := 71, we select k primes, 

«i, , Hfc, each 16-bits in length. In fact, let us choose n\, . . . , rik to be the largest 
k primes under 2 16 . If we do this, then the smallest prime among the n,’s turns out 
to be 64717, which is greater than 2 15 - 98 . We may set M := 2 1024 , and since we 
want to correct up to 3 errors, we may set P := 2 3 ' 16 . Then with n := ]~[ ( . we 
have 

> 2 71 ' 15 - 98 = 2 1134 - 58 > 2 1121 = 2 1+1024+6 ' 16 = 2 MP 2 

Thus, with these parameter settings, the above scheme will correct up to 3 cor- 
rupted blocks. This comes at a cost of increasing the length of the message from 
1024 bits to 71 • 16 = 1 136 bits, an increase of about 11%. □ 

4.6.3 Applications to symbolic algebra 

Rational reconstruction also has a number of applications in symbolic algebra. We 
briefly sketch one such application here. Suppose that we want to find the solution 
v to the equation vA = w, where we are given as input a non-singular square 
integer matrix A and an integer vector w. The solution vector v will, in general, 
have rational entries. We stress that we want to compute the exact solution v, and 
not some floating point approximation to it. Now, we could solve for v directly 
using Gaussian elimination; however, the intermediate quantities computed by that 
algorithm would be rational numbers whose numerators and denominators might 
get quite large, leading to a rather lengthy computation (however, it is possible to 
show that the overall running time is still polynomial in the input length). 

Another approach is to compute a solution vector modulo n, where n is a power 
of a prime that does not divide the determinant of A. Provided n is large enough, 
one can then recover the solution vector v using rational reconstruction. With this 
approach, all of the computations can be carried out using arithmetic on integers 
not too much larger than n, leading to a more efficient algorithm. More of the 
details of this procedure are developed later, in Exercise 14.18. 



98 


Euclid’s algorithm 


Exercise 4.18. Let n,b e Z with 0 < b < n, and let EEA(n, b) = {(r,, s„ 

This exercise develops some key properties of the fractions —Sj/tj as approxima- 
tions to b/n. For / = 1, . . . , A + 1, let £,• := b/n + Sj/tj. 

(a) Show that £,• = r, /tin for i = 1, . . . , A + 1. 

(b) Show that successive £,’s strictly decrease in absolute value, and alternate 
in sign. 

(c) Show that |e,| < 1 / t 2 for i = 1, 2, and e^+\ = 0. 

(d) Show that for all s,t e Z with t ^ 0, if | b/n - s/t\ < 1/2 1 1 , then 
s/t = —Si/ti for some i = 1, . . . , A + 1. Hint: use part (ii) of Theorem 4.9. 

(e) Consider a fixed index i e {2, . . . , A + 1}. Show that for all s,t e Z, if 
0 < |t| < |h| and | b/n — s/t | < |£,-|, then s/t = —Sj/tj. In this sense, —Sj/tj 
is the unique, best approximation to b/n among all fractions of denominator 
at most |f,|. Hint: use part (i) of Theorem 4.9. 

Exercise 4.19. Using the decimal approximation n k 3.141592654, apply 
Euclid’s algorithm to calculate a rational number of denominator less than 1000 
that is within 10 -6 of re. Illustrate the computation with a table as in Fig. 4.1. 

Exercise 4.20. Show that given integers s, t, k, with 0 < s < t, and k > 0, we 
can compute the /cth digit in the decimal expansion of s/t in time 0(len(/c) len(f) 2 ). 

For the following exercises, we need a definition. Let *? = {u }°^ be a sequence 
of elements drawn from some arbitrary set. For integers k > 0 and t > 1 , we say 
that T is (k,^) -periodic if n = Zi+t for all i > k\ in addition, we say that Y is 
ultimately periodic if it is {k, ^-periodic for some (k, i). 

Exercise 4.21. Show that if a sequence T is ultimately periodic, then it is 
(k*,£* Aperiodic for some uniquely determined pair (k*, l*) for which the follow- 
ing holds: for every pair ( k ,£ ) such that T is (7c, f j-pcriodic, we have k* < k and 

r 1 1 . 

The value £* in the above exercise is called the period of T, and k* is called the 
pre-period of T. If its pre-period is zero, then T is called purely periodic. 

Exercise 4.22. Let z be a real number whose decimal expansion is an ultimately 
periodic sequence. Show that z is rational. 

Exercise 4.23. Let z = s/t e Q, where s and t are relatively prime integers with 
0 < s < t. Show that: 

(a) there exist integers k, k' such that 0 < k < k' and .v I () k = s 1 0* (mod ?); 

(b) for all integers k, k' with 0 < k < k' , the decimal expansion of £ is 
(k, k' — k)-periodic if and only if slO^ = slO^ (mod t); 
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(c) if gcd(10, t) = 1, then the decimal expansion of z is purely periodic with 
period equal to the multiplicative order of 10 modulo f; 

(d) more generally, if k is the smallest non-negative integer such that 10 and 
t' := t / gcd( 10*, t) are relatively prime, then the decimal expansion of z is 
ultimately periodic with pre -period k and period equal to the multiplicative 
order of 10 modulo t’ . 

A famous conjecture of Artin postulates that for every integer d, not equal to -1 
or to the square of an integer, there are infinitely many primes t such that d has 
multiplicative order t — 1 modulo t. If Artin’s conjecture is true, then by paid (c) 
of the previous exercise, there arc infinitely many primes t such that the decimal 
expansion of s/t, for every s with 0 < s < t, is a purely periodic sequence of period 
t — 1. In light of these observations, the “grade school” method of computing a 
fraction from its decimal expansion using the period is hopelessly impractical. 


4.7 The RSA cryptosystem 

One of the more exciting uses of number theory in recent decades is its application 
to cryptography. In this section, we give a brief overview of the RSA cryptosystem, 
named after its inventors Rivest, Shamir, and Adleman. At this point in the text, 
we already have the concepts and tools at our disposal necessary to understand the 
basic operation of this system, even though a full understanding of the system will 
require other ideas that will be developed later in the text. 

Suppose that Alice wants to send a secret message to Bob over an insecure net- 
work. An adversary may be able to eavesdrop on the network, and so sending the 
message “in the clear” is not an option. Using older, more traditional cryptographic 
techniques would require that Alice and Bob share a secret key between them; 
however, this creates the problem of securely generating such a shared secret. The 
RSA cryptosystem is an example of a public key cryptosystem. To use the system. 
Bob simply places a “public key” in the equivalent of an electronic telephone book, 
while keeping a corresponding “private key” secret. To send a secret message to 
Bob, Alice obtains Bob’s public key from the telephone book, and uses this to 
encrypt her message. Upon receipt of the encrypted message. Bob uses his private 
key to decrypt it, obtaining the original message. 

Here is how the RSA cryptosystem works. To generate a public key/private key 
pair. Bob generates two very large, random primes p and q, with p ^ q. To be 
secure, p and q should be quite large; in practice, they are chosen to be around 512 
bits in length. Efficient algorithms for generating such primes exist, and we shall 
discuss them in detail later in the text (that there arc sufficiently many primes of a 
given bit length will be discussed in Chapter 5 ; algorithms for generating them will 
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be discussed at a high level in §9.4, and in greater detail in Chapter 10). Next, Bob 
computes n := pq. Bob also selects an integer e > 1 such that gcd(e, cp(n)) = 1, 
where cp is Euler’s phi function. Here, (pin) = ( p— I )(q— 1). Finally, Bob computes 
d := e~ l mod cp(n ), using the extended Euclidean algorithm. The public key is the 
pair ( n , e), and the private key is the pair (n, d). The integer e is called the “encryp- 
tion exponent” and d is called the “decryption exponent.” In practice, the integers 
n and d arc about 1024 bits in length, while e is usually significantly shorter. 

After Bob publishes his public key («, e), Alice may send a secret message to 
Bob as follows. Suppose that a message is encoded in some canonical way as a 
number between 0 and n — 1 — we can always interpret a bit string of length less 
than len(n) as such a number. Thus, we may assume that a message is an element 
a of Z„. To encrypt the message a, Alice simply computes ft := a e using repeated 
squaring. The encrypted message is /l. When Bob receives /?, he computes y := p d , 
and interprets y as a message. 

The most basic requirement of any encryption scheme is that decryption should 
“undo” encryption. In this case, this means that for all a e Z„, we should have 

( a e ) d = a. (4.24) 

If a e Z*. then this is clearly the case, since we have ed = 1 + (p(n)k for some 
positive integer k, and hence by Euler’s theorem (Theorem 2.13), we have 

(. a e ) d = a ed = a Y+cp(n)k = a ■ a^ n)k = a. 

To argue that (4.24) holds in general, let a be an arbitrary element of Z„, and 
suppose a = [a]„. If a = 0 (mod p), then trivially a ed = 0 (mod p)\ otherwise, 

a ed = a \ +cp(n)k = a . a cp(n)k = fl (mod p y 

where the last congruence follows from the fact that cp{n)k is a multiple of p — 1, 
which is a multiple of the multiplicative order of a modulo p (again by Euler’s the- 
orem). Thus, we have shown that a ed = a (mod p). The same argument shows that 
a ed = a (mod q), and these two congruences together imply that a ed = a (mod n). 
Thus, we have shown that equation (4.24) holds for all a e Z„. 

Of course, the interesting question about the RSA cryptosystem is whether or not 
it really is secure. Now, if an adversary, given only the public key («, e), were able 
to factor n, then he could easily compute the decryption exponent d himself using 
the same algorithm used by Bob. It is widely believed that factoring n is computa- 
tionally infeasible, for sufficiently large n, and so this line of attack is ineffective, 
bailing a breakthrough in factorization algorithms. Indeed, while trying to factor 
n by brute-force search is clearly infeasible, there arc much faster algorithms, but 
even these arc not fast enough to pose a serious threat to the security of the RSA 
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cryptosystem. We shall discuss some of these faster algorithms in some detail later 
in the text (in Chapter 15). 

Can one break the RSA cryptosystem without factoring n? For example, it is 
natural to ask whether one can compute the decryption exponent d without having 
to go to the trouble of factoring n. It turns out that the answer to this question is 
“no”: if one could compute the decryption exponent d, then ed — 1 would be a 
multiple of cp(n), and as we shall see later in §10.4, given any multiple of (pin), 
we can easily factor n. Thus, computing the decryption exponent is equivalent to 
factoring n , and so this line of attack is also ineffective. But there still could be 
other lines of attack. For example, even if we assume that factoring large numbers 
is infeasible, this is not enough to guarantee that for a given encrypted message ji, 
the adversary is unable to compute ji d (although nobody actually knows how to do 
this without first factoring n). 

The reader should be warned that the proper notion of security for an encryp- 
tion scheme is quite subtle, and a detailed discussion of this is well beyond the 
scope of this text. Indeed, the simple version of RSA presented here suffers from a 
number of security problems (because of this, actual implementations of public- 
key encryption schemes based on RSA arc somewhat more complicated). We 
mention one such problem here (others arc examined in some of the exercises 
below). Suppose an eavesdropping adversary knows that Alice will send one of 
a few, known, candidate messages. For example, an adversary may know that 
Alice’s message is either “let’s meet today” or “let’s meet tomorrow.” In this case, 
the adversary can encrypt for himself each of the candidate messages, intercept 
Alice’s actual encrypted message, and then by simply comparing encryptions, the 
adversary can determine which particular message Alice encrypted. This type of 
attack works simply because the encryption algorithm is deterministic, and in fact, 
any deterministic encryption algorithm will be vulnerable to this type of attack. To 
avoid this type of attack, one must use a probabilistic encryption algorithm. In the 
case of the RSA cryptosystem, this is often achieved by padding the message with 
some random bits before encrypting it (but even this must be done care fully). 


Exercise 4.24. This exercise develops a method to speed up RSA decryption. 
Suppose that we are given two distinct (-bit primes, p and q, an element /? e Z„, 
where n := pq, and an integer d, where 1 < d < (pin). Using the algorithm from 
Exercise 3.35, we can compute p d at a cost of essentially 2( squarings in Z„. Show 
how this can be improved, making use of the factorization of n, so that the total cost 
is essentially that of l squarings in Z p and i squarings in Z 9 , leading to a roughly 
four-fold speed-up in the running time. 


Exercise 4.25 . Alice submits a bid to an auction, and so that other bidders cannot 
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see her bid, she encrypts it under the public key of the auction service. Suppose 
that the auction service provides a public key for an RSA encryption scheme, with 
a modulus n. Assume that bids arc encoded simply as integers between 0 and n— 1 
prior to encryption. Also, assume that Alice submits a bid that is a “round number,” 
which in this case means that her bid is a number that is divisible by 10. Show how 
an eavesdropper can submit an encryption of a bid that exceeds Alice’s bid by 10%, 
without even knowing what Alice’s bid is. In particular, your attack should work 
even if the space of possible bids is very large. 

Exercise 4.26. To speed up RSA encryption, one may choose a very small 
encryption exponent. This exercise develops a “small encryption exponent attack” 
on RSA. Suppose Bob, Bill, and Betty have RSA public keys with moduli n\, m, 
and « 3 , and all three use encryption exponent 3. Assume that [n,}: =] is pairwise 
relatively prime. Suppose that Alice sends an encryption of the same message to 
Bob, Bill, and Betty — that is, Alice encodes her message as an integer a , with 
0 < a < min{«i, ri 2 , n$}, and computes the three encrypted messages /l, := [ a 3 ]„., 
for i = 1, . . . , 3. Show how to recover Alice’s message from these three encrypted 
messages. 

Exercise 4.27. To speed up RSA decryption, one might choose a small decryp- 
tion exponent, and then derive the encryption exponent from this. This exercise 
develops a “small decryption exponent attack” on RSA. Suppose n = pq, where 
p and q are distinct primes with lent p) = lcn(ry). Let d and e be integers such 
that 1 < d < cp(n), 1 < e < cp(n), and de = 1 (mod cp(n)). Further, assume 
that d < n 1 / 4 /3. Show how to efficiently compute d , given n and e. Hint: since 
ed = 1 (mod cp(n)), it follows that ed = 1 + cp(n)k for an integer k with 0 < k < d\ 
let r := nk — ed , and show that \r\ < « 3//4 ; next, show how to recover d (along with 
r and k) using Theorem 4.9. 


4.8 Notes 

The Euclidean algorithm as we have presented it here is not the fastest known 
algorithm for computing greatest common divisors. The asymptotically fastest 
known algorithm for computing the greatest common divisor of two numbers of 
bit length at most i runs in time 0(( len(()) on a RAM, which is due to Schonhage 
[85]. The same algorithm leads to Boolean circuits of size 0(1 lcn(C) 2 lcn(lcn(()j), 
which using Furer’s result [38], can be reduced to 0(1 lcn(t) 2 2° (log n> ). The same 
complexity results also hold for the extended Euclidean algorithm, as well as for 
Chinese remaindering, Thue’s lemma, and rational reconstruction. 

Experience suggests that such fast algorithms for greatest common divisors are 
not of much practical value, unless the integers involved are very large — at least 
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several tens of thousands of bits in length. The extra “log” factor and the rather 
large multiplicative constants seem to slow things down too much. 

The binary gcd algorithm (Exercise 4.6) is due to Stein [100]. The extended 
binary gcd algorithm (Exercise 4.10) was first described by Knuth [56], who 
attributes it to M. Penk. Our formulation of both of these algorithms closely follows 
that of Menezes, van Oorschot, and Vanstone [66] . Experience suggests that the 
binary gcd algorithm is faster in practice than Euclid’s algorithm. 

Schoof [87] presents (among other things) a deterministic, polynomial-time 
algorithm that computes a square root of - 1 modulo p for any given prime p = 
1 (mod 4). If we use this algorithm in §4.5, we get a deterministic, polynomial- 
time algorithm to compute integers r and t such that p = r 2 + t 1 . 

Our Theorem 4.9 is a generalization of one stated in Wang, Guy, and Davenport 
[103]. One can generalize Theorem 4.9 using the theory of continued fractions. 
With this, one can generalize Exercise 4.18 to deal with rational approximations to 
irrational numbers. More on this can be found, for example, in the book by Hardy 
and Wright [46]. 

The application of Euclid’s algorithm to computing a rational number from the 
first digits of its decimal expansion was observed by Blum, Blum, and Shub [17], 
where they considered the possibility of using such sequences of digits as a pseudo- 
random number generator — the conclusion, of course, is that this is not such a 
good idea. 

The RSA cryptosystem was invented by Rivest, Shamir, and Adleman [82], 
There is a vast literature on cryptography. One starting point is the book by 
Menezes, van Oorschot, and Vanstone [66]. The attack in Exercise 4.27 is due 
to Wiener [1 10]; this attack was recently strengthened by Boneh and Durfee [19]. 
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This chapter concerns itself with the question: how many primes are there? In 
Chapter 1, we proved that there arc infinitely many primes; however, we arc inter- 
ested in a more quantitative answer to this question; that is, we want to know how 
“dense” the prime numbers arc. 

This chapter has a bit more of an “analytical” flavor than other chapters in this 
text. However, we shall not make use of any mathematics beyond that of elemen- 
tary calculus. 


5.1 Chebyshev’s theorem on the density of primes 

The natural way of measuring the density of primes is to count the number of 
primes up to a bound x, where x is a real number. To this end, we introduce 
the function tt(x), whose value at each real number x > 0 is defined to be the 
number of primes up to (and including) x. For example, n{\) = 0, n(T) = 1, 
and n{ 7.5) = 4. The function ;r(x) is an example of a “step function,” that is, a 
function that changes values only at a discrete set of points. It might seem more 
natural to define /r(x) only on the integers, but it is the tradition to define it over 
the real numbers (and there are some technical benefits in doing so). 

Let us first take a look at some values of ;r(x). Table 5.1 shows values of ;r(x) for 
x = 10 3 ' and i = 1, . . . , 6. The third column of this table shows the value of x/n{x) 
(to five decimal places). One can see that the differences between successive rows 
of this third column are roughly the same — about 6.9 — which suggests that the 
function x/ji{x) grows logarithmically in x. Indeed, as log(10 3 ) as 6.9, it would 
not be unreasonable to guess that x/n{x) as log x, or equivalently, jt(x) as x/ log x 
(as discussed in the Preliminaries, log x denotes the natural logarithm of x). 

The following theorem is a first — and important — step towards making the 
above guesswork more rigorous (the statements of this and many other results in 
this chapter make use of the asymptotic notation introduced in §3.1): 
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Table 5.1. Some values of n(x) 


X 

n(x) 

x/n(x) 

10 3 

168 

5.95238 

10 6 

78498 

12.73918 

10 9 

50847534 

19.66664 

10 12 

37607912018 

26.59015 

10 15 

29844570422669 

33.50693 

10 18 

24739954287740860 

40.42045 


Theorem 5.1 (Chebyshev’s theorem). We have 

7T(x) = 0(x/log x). 

It is not too difficult to prove this theorem, which we now proceed to do in several 
steps. We begin with some elementary bounds on binomial coefficients (see §A2): 


Lemma 5.2. If m is a positive integer, then 


^ 2 2m /2m and ^ 


2m + 1 
m 


<2 


1 2 m 


Proof. As ( 2 '" ) is the largest binomial coefficient in the binomial expansion of 
(1 + l) 2 '", we have 


2m 


■>2 m 


i=0 


= Z( 2 n = >+Z 


2m— 1 


i=l 


2m 

i 


„ „ . 2 m\ „ / 2m 

+ 1 < 2 + (2m — 1)1 ) < 2m I 

ml \ m 


The proves the first inequality. For the second, observe that the binomial coefficient 
( 2m + 1 ) occurs twice in the binomial expansion of (1 + l) 2m+1 , and is therefore less 
than 2 2m+1 /2 = 2 2m . □ 


Next, recalling that v p (n) denotes the power to which a prime p divides an integer 
n, we continue with the following observation: 


Lemma 5.3. Let n be a positive integer. For every prime p, we have 

v p (n\) = Y j [_n/p k \. 

k> 1 

Proof. For all positive integers j,k, define djk '■= 1 if p k \ j, and djk '■= 0, 
otherwise. Observe that v p (J) = Xa>i djk (this sum is actually finite, since djk = 0 
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for all sufficiently large k). So we have 

n n n 

v p(«o = Y v p u) = Z Y d J k = Z Z 

7= I 7=1 k> 1 fc>l 7=1 

Finally, note that X;=i 4/fc i s equal to the number of multiples of p k among the 
integers 1 ,...,«, which by Exercise 1.3 is equal to [n/p k \ . □ 

The following theorem gives a lower bound on n(x). 

Theorem 5.4. n(n) > |(log 2)n/ log n for every integer n >2. 


Proof. Let m be a positive integer, and consider the binomial coefficient 

N •= ( 2m \ = ^ 2m ^ ! 

\m ) (ml) 2 ' 

It is clear that N is divisible only by primes p up to 2m. Applying Lemma 5.3 to 
the identity N = (2 m)\/(m\) 2 , we have 

v P (N)= £(L2m//j -2[m/p k \). 

k> 1 


Each term in this sum is either 0 or 1 (see Exercise 1 .4), and for k > log(2m) / log p, 
each term is zero. Thus, v p (N) < log(2m)/log p. So we have 


jt(2m) log( 2 m) = ^ l0g(2m) log p 

P 7? m l °Z p 

> Yj v p(N) log p = log TV, 

p<2m 


where the summations arc over the primes p up to 2m. By Lemma 5.2, we have 
N > 2 lm / 2 m > 2 m , and hence 

zr( 2 m) log( 2 m) > mlog 2 = ^(log 2 )( 2 m). 


That proves the theorem for even n. Now consider odd n > 3, so n = 2m — 1 for 
some m > 2. It is easily verified that the function x/ log x is increasing for x > 3; 
therefore, 


/r (2 m — 1 ) = Jt(2m) 

> 5 (log 2) (2m)/ log( 2 m) 

> ^(log 2 )( 2 m - l)/log( 2 m - 1 ). 

That proves the theorem for odd n. □ 
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As a consequence of the above theorem, we have n(x) = £2(x/logx) for real 
numbers x. Indeed, setting c := ^(log 2), for every real number x > 2, we have 

k(x) — [xj ) > c [xj / log [xj > c(x - 1)/ log x; 

from this, it is clear that n(x) = Q(x/ log x). 

To obtain a corresponding upper bound for /r(x), we introduce an auxiliary func- 
tion, called Chebyshev’s theta function: 

«Kx) := 2 ^g A 

p<x 

where the sum is over all primes p up to x. 

Chebyshev’s theta function is an example of a summation over primes, and in 
this chapter, we will be considering a number of functions that arc defined in terms 
of sums or products over primes (and indeed, such summations already cropped up 
in the proof of Theorem 5.4). To avoid excessive tedium, we adopt the usual con- 
vention used by number theorists: if not explicitly stated, summations and products 
over the variable p arc always understood to be over primes. For example, we may 
write n(x) = 1- 

Theorem 5.5. We have 

d(x) = 0(?r(x) log x). 

Proof. On the one hand, we have 

d(x) = ^ log p < log x ^ 1 = k(x) log x. 

p<x p<x 

On the other hand, we have 

d(x) = ^logp > ^ logp>Jjlogx ^ 1 

P^ X X [ / 2 < p<x X l / 2 <p<X 

= \ log X (7r(x) - ^"(x 1 / 2 )) = 5 (l - 7t(x { t 2 ) / 7t(x)) Jt(x) log X. 

It will therefore suffice to show that ^(x 1 / 2 ) / 7r(x) = o(l). Cleai'ly, 7r(x 1,/2 ) < x 1 / 2 . 
Moreover, by the previous theorem, ji{x) = £2(x/ log x). Therefore, 

7r(x 1 ^ 2 )/^-(x) = 0( logx/x 1 / 2 ) = o(l), 

and the theorem follows. □ 

Theorem 5.6. ?3(x) < 2(log2)x for every real number x > 1. 

Proof. It suffices to prove that d(«) < 2(log 2 )n for every positive integer n, since 
then d(x) = d( [xj ) < 2(log 2) [xj < 2(log 2)x. We prove this by induction on n. 
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For n = 1 and n = 2, this is clear, so assume n > 2. If n is even, then using the 
induction hypothesis for n — 1 , we have 

6(n) = d(n - 1) < 2(log2)(» - 1) < 2(log2)n. 

Now consider the case where n is odd. Write n = 2m + I , where m is a positive 
integer, and consider the binomial coefficient 

( 2m + 1\ (2m + 1) ■ ■ ■ (m + 2) 

M:= = : . 

\ m J ml 

Observe that M is divisible by all primes p with m + 1 < p < 2m + 1. Moreover, 
be Lemma 5.2, we have M < 2 2m . It follows that 

6(2 m + 1) - 6(m + 1) = ^ log p < log M < 2(log 2 )m. 

m+l<p<2m+\ 

Using this, and the induction hypothesis for m + 1, we obtain 

6(n) = 6(2 m + 1) — 6(m + 1) + 6(m + 1) 

< 2(log 2 )m + 2(log 2)(m + 1) = 2(log 2 )n. □ 

Another way of stating the above theorem is: 

ru<4' 

p<x 

Theorem 5.1 follows immediately from Theorems 5.4, 5.5 and 5.6. Note that we 
have also proved: 

Theorem 5.7. We have 

6(x) = 0(x). 


Exercise 5.1. For each positive integer n , let p n denote the nth prime. Show that 
p n = Q(n log n). 

Exercise 5.2. For each positive integer n, let co(n) denote the number of distinct 
primes dividing n. Show that co(n) = Oflog «/ log log n). 

Exercise 5.3. Show that J] p<x 1/log p = 0(x/(logx) 2 ). 


5.2 Bertrand’s postulate 

Suppose we want to know how many primes there are of a given bit length, or 
more generally, how many primes there are between m and 2m for a given positive 
integer m. Neither the statement, nor our proof, of Chebyshev’s theorem imply that 
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there arc any primes between m and 2m, let alone a useful density estimate of such 
primes. 

Bertrand’s postulate is the assertion that for every positive integer m, there 
exists a prime between m and 2m. We shall in fact prove a stronger result: there is 
at least one prime between m and 2m, and moreover, the number of such primes is 
Q.(m/ log m). 

Theorem 5.8 (Bertrand’s postulate). For every positive integer m, we have 

m 

iz(2m) - x{m) > — — — — . 

3 log(2m) 

The proof uses Theorem 5.6, along with a more careful re-working of the proof 
of Theorem 5.4. The theorem is clearly true for m < 2, so we may assume that 
m > 3. As in the proof of the Theorem 5.4, define N := ( 2 ) '"), and recall that N is 
divisible only by primes less than 2m, and that we have the identity 

v P (N) = £(L2m//j - 2|m//j), (5.1) 

k> 1 

where each term in the sum is either 0 or 1. We can characterize the values v p (N) 
a bit more precisely, as follows: 

Lemma 5.9. Let m > 3 and N := ( 2 '" ) . For all primes p, we have: 


pVpW) < 2m: ( 5 . 2 ) 

if p > V2m, then v p (N) < 1; ( 5 . 3 ) 

if 2m / 3 < p < m, then v p (N) = 0 ; ( 5 . 4 ) 

if m < p < 2m, then v p (N) = 1 . ( 5 . 5 ) 


Proof. For (5.2), all terms with k > log(2m) / log p in (5.1) vanish, and hence 
v p (N) < log(2m) / log p, from which it follows that p v p (N ' ) < 2m. 

(5.3) follows immediately from (5.2). 

For (5.4), if 2m/3 < p < m, then 2 m/p < 3, and we must also have p > 3, 
since p = 2 implies m < 3. We have p 2 > p{2m/3) = 2m(p/3) > 2m, and hence 
all terms with k > 1 in (5.1) vanish. The term with /< = I also vanishes, since 
1 < m/p < 3/2, from which it follows that 2 < 2 m/p < 3, and hence [_m/p\ = 1 
and \2.m/p\ = 2. 

For (5.5), if m < p < 2m, it follows that 1 < 2 m/p < 2, so \2m/p\ = 1. Also, 
m/p < 1, so \tn/p\ = 0. It follows that the term with k = 1 in (5.1) is 1, and it is 
clear that 2 m/p k < 1 for all k > 1, and so all the other terms vanish. □ 
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We now have the necessary technical ingredients to prove Theorem 5.8. Define 

Pm ■— | | Pi 

m<p<2m 


and define Q m so that 


N = Q m P m . 


By (5.4) and (5.5), we see that 

Q,n= n p Vp(N) . 

p<2m/3 

Moreover, by (5.3), v p (N) > 1 for at most those p < V2m, so there are at most 
V2m such primes, and by (5.2), the contribution of each such prime to the above 
product is at most 2m. Combining this with Theorem 5.6, we obtain 

Q m < (2m) m • 4 2m/3 . 


We now apply Lemma 5.2, obtaining 

Pm = NQ; n l > 2 2m (2 m)- l Q~ l > 4 m/3 (2 m) _(1+VSn) . 


It follows that 

jt(2m) 


n(m) > log P m / log (2m) > 


m log 4 


(1 + \f2rn) 


3 log(2m) 
m(log4-l) 

31og(2 m) ' 31og(2 m) * m ' 


m 


+ 


Clearly, for all sufficiently large m. we have 


m(log 4-1) 
3 log(2m) 



(5.6) 


That proves Theorem 5.8 for all sufficiently large m. Moreover, a simple calcula- 
tion shows that (5.6) holds for all m > 13,000, and one can verify by brute force 
(with the aid of a computer) that the theorem holds for m < 13,000. 


5.3 Mertens’ theorem 

Our next goal is to prove the following theorem, which turns out to have a number 
of applications. 


y - = log log x + 0(1). 


Theorem 5.10. We have 
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The proof of this theorem, while not difficult, is a bit technical, and we proceed 
in several steps. 


Theorem 5.11. We have 



= logx + 0(1). 


Proof. Let n := [vj . The idea of the proof is to estimate log(n!) in two different 
ways. By Lemma 5.3, we have 

log(«!) = II [n/p k \ log P=Y L m /pJ lo 8 P + II \n/p k \ log p. 

p<n k> 1 p<n k>2 p<n 

We next show that the last sum is 0{n). We have 


Y 1o § p Y L«/p*J < n Y lo § P Y p k 

p<n k> 2 p<n k>2 


V log p 1 v log P 

^ p 2 1-1 p ^p(p-l) 

p<n y 1 y p<n ' 


<nY 

k>2 


log^ 

k(k - 1) 


= 0(n). 


Thus, we have shown that 

log(w!) = log p+0(n). 

p<n 

Since \n/p\ = n/p + 0(1), applying Theorem 5.6 (and Exercise 3.12), we obtain 

log(«!) = Y (n / p)] °£ p + 0 (Y ]o5p ) + °W = n Y + °(«)- ( 5 - 7 ) 

p<n p<n p<n 7 

We can also estimate log(n!) by estimating a sum by an integral (see §A5): 

» rn 

log(«!) = V log k = log t dt + O(logn) = nlogn - n + O(logn). (5.8) 

*=i Jl 

Combining (5.7) and (5.8), and noting that log x-log n = o(l) (see Exercise 3.11), 
we obtain 

Y = lo §" + °1 1 ) = log-v + 0(1), 

" p 

p<x r 

which proves the theorem. □ 


We shall also need the following theorem, which is a very useful tool in its own 
right; it is essentially a discrete valiant of “integration by parts.” 
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Theorem 5.12 (Abel’s identity). Let {cj}°l k be a sequence of real numbers, and 
for each real number t, define 

C(t ) := c, ■ 

k<i<t 


Further, suppose that /(f) is a function with a continuous derivative f'(t) on the 
interval [k, x], where x is a real number, with x > k. Then 


X c *'/(0 = C(x)f(x) - 

k<i<x 


• x 

C(f)/'(t) dt. 

k 


Note that since C(t) is a step function, the integrand C(f) /'(t) is piece-wise 
continuous on [k, x], and hence the integral is well defined (see §A4). 

Proof. Let n := |pc_l- We have 

n n 

2 cj(i) = C(k)f(k ) + 2 [C(0 - C(i - !)]/(*) 


i=k 


i=k + 1 


n— 1 


= 2 C (0[/(0 - /(*• + 1)] + c(n)/(n) 

i=k 
n— 1 

= 2 c (0[/(0 - /(f + 1)] + C(n)[f(n) - f(x)] + C(x)f(x). 


i=k 


Observe that for i = k, . . . , n — 1, we have C(f) = C(i) for all t € [/, i + 1), and so 


w+i 

C(i)[f(i) - f(i + 1)] = -C(i) f\f)dt 

i 

likewise, 

C(n)[f(n) - f(x)] = - 
from which the theorem directly follows. □ 


ri+l 


C(t)f\t ) df, 


C(t)f(t) dt. 


Proof of Theorem 5.10. For i > 2, set 

__ / (log i)/i if i is prime, 

' 0 otherwise. 

By Theorem 5.1 1, we have 

C(t ) := 2 c ‘ = E — = lo § 1 + 

2 <i<t P<t P 
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where R{t) = 0(1). Applying Theorem 5.12 with fit) := 1/log t (and using 
Exercise 3.13), we obtain 


E \ = E c ‘f (r) = 


p<x 


2 <i<x 


R(x ) 

— 1 + h 

logx 


C(X) 

logx 

' x dt 
2 tlogt 


+ 


at) 


+ 


2 l(logt) 2 

m 

t(log t ) 2 


dt 


dt 


= 1 + 0(1/ log x) + (log log x 
= log logx + 0(1). □ 


log log 2) + 0(1) 


Using Theorem 5.10, we can easily show the following: 


Theorem 5.13 (Mertens’ theorem). We have 

J](l- 1/p) = 0(1/ logx). 

p<x 


Proof. Using parts (i) and (iii) of §A1, for any fixed prime p, we have 


Moreover, since 


- 4 < — + iog(i - i Ip) < o. 
p 2 p 



< 00 , 


summing the inequality (5.9) over all primes p < x yields 

-c < y - +iogg(x) < o. 


(5.9) 


where C is a positive constant, and g(x) := <x (l - 1/p). From this, and 
from Theorem 5.10, we obtain logg(x) = - log logx + 0(1), which implies that 
g(x) = 0(1/ log x) (see Exercise 3.11). That proves the theorem. □ 


Exercise 5.4. For each positive integer k , let denote the product of the first k 
primes. Show that (pi P/ ; ) = 0(i\ / log log E/J (here, cp is Euler’s phi function). 

Exercise 5.5. The previous exercise showed that cp(n) could be as small as 
(about) n/ log log n for infinitely many n. Show that this is the “worst case,” in 
the sense that <p(n) = Q(«/loglogn). 

Exercise 5.6. Show that for every positive integer constant k, 

x +o( x l 

J 2 (log t) k (logx) fc V (log x) k+l / 
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This fact may be useful in some of the following exercises. 

Exercise 5.7. Use Chebyshev’s theorem and Abel’s identity to prove a stronger 
version of Theorem 5.5: d(x) = n{x) log x + 0(x/ log x). 

Exercise 5.8. Use Chebyshev’s theorem and Abel’s identity to show that 

= < 3 

p^log P lo §* 

Exercise 5.9. Show that 

Yl (1 - 2/p) = 0(l/(log x) 2 ). 

2 <p<x 

Exercise 5.10. Show that if jt{x) ~ cx/ log x for some constant c, then we must 
have c = 1. 

Exercise 5.11. Strengthen Theorem 5.10: show that for some constant A, we 
have Xp<A 1 /p = l°g log x + A + o(l). You do not need to estimate A, but in fact 
A fa 0.261497212847643. 


Exercise 5.12. Use the result from the previous exercise to strengthen Mertens’ 
theorem: show that for some constant B\, we have IW 1 “ !/t) ~ Bt/(logx). 
You do not need to estimate B\, but in fact B{ x 0.561459483566885. 

Exercise 5.13. Strengthen the result of Exercise 5.9: show that for some con- 
stant B 2 , we have 

(i - 2 Ip) ~ b 2 /( logx) 2 . 

2 <p<x 

You do not need to estimate B 2 , but in fact B 2 m 0.832429065662. 


Exercise 5.14. Use Abel’s identity to derive Euler’s summation formula: if 
f(t) has a continuous derivative f'(t) on the interval [«, b], where a and b arc 
integers, then 


b 


£/( o 


r b 

fit ) dt = f(a) + 


f b 

(t-lt\)f(t)dt. 


J a 


J a 


Exercise 5.15. Use Euler’s summation formula (previous exercise) to show that 
log(n!) = n log n — n + j log n + 0(1), 

and from this, conclude that n\ = ®({n/e) n sfn). This is a weak form of Stirling’s 
approximation; a sharper form states that n\ ~ (n/ e) n s/2nn. 
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Exercise 5.16. Use Stirling’s approximation (previous exercise) to show that 



= ®(l 2m / \fm). 


5.4 The sieve of Eratosthenes 

As an application of Theorem 5.10, consider the sieve of Eratosthenes. This is 
an algorithm that generates all the primes up to a given bound n. It uses an array 
A[ 2 . . . n], and runs as follows. 

for k <— 2 to n do A[k\ <— 1 
for k <- 2 to LV»J do 
if A[k\ = 1 then 
i <- 2k 

while i < n do 

A\i\ <— 0, i <r- i + k 


When the algorithm finishes, we have A[k] = 1 if and only if k is prime, for 
k = 2 This can easily be proven using the fact (see Exercise 1.2) that a 
composite number k between 2 and n must be divisible by a prime that is at most 
\fn , and by proving by induction on k that at the beginning of each iteration of 
the main loop, A\i\ = 0 if and only if i is divisible by a prime less than k, for 
i = k, . . . ,n. We leave the details of this to the reader. 

We are more interested in the running time of the algorithm. To analyze the 
running time, we assume that all arithmetic operations take constant time; this 
is reasonable, since all the numbers computed are used as array indices and thus 
should fit in single machine words. Therefore, we can assume that built-in arith- 
metic instructions arc used for operating on such numbers. 

Every time we execute the inner loop of the algorithm, we perform ()(n/k) steps 
to clear the entries of A indexed by multiples of k. Pessimistically, then, we could 
bound the total running time by 0(nT(n)), where 

T(n) := 2 !/*• 

k<\fn 

Estimating the sum by an integral (see §A5), we have 


LV»J 

Tin) = 2 1/* 
k = 1 


f 1 ^ 1 dy 1 

— + 0(1) ~ - log n. 

i y 2 


This implies a 0(n len(n)) bound on the running time of the algorithm. However, 
this rather crude analysis ignores the fact that the inner loop is executed only for 
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prime values of k\ taking this fact into account, we see that the running time is 
(){n T\ in)), where 

T\(n) := ^ 1/p. 

P<Vn 

By Theorem 5.10, T\{n) = log log n + 0(1), which implies a 0(n len(len(«))) 
bound on the running time of the algorithm. This is a substantial improvement 
over the above, rather crude analysis. 


Exercise 5.17. Give a detailed proof of the correctness of the above algorithm. 

Exercise 5.18. One drawback of the above algorithm is its use of space: it 
requires an array of size n. Show how to modify the algorithm, without substan- 
tially increasing its running time, so that one can enumerate all the primes up to n, 
using an auxiliary array of size just 0(\fn). 

Exercise 5.19. Design and analyze an algorithm that on input n outputs the table 
of values t{k) for k = 1, . . . , «, where r{k) is the number of positive divisors of k. 
Your algorithm should run in time 0{n lcn(nj). 


5.5 The prime number theorem . . . and beyond 

In this section, we survey a number of theorems and conjectures related to the 
distribution of primes. This is a vast area of mathematical research, with a number 
of very deep results. We shall be stating a number of theorems from the literature 
in this section without proof; while our intent is to keep the text as self contained as 
possible, and to avoid degenerating into “mathematical tourism,” it nevertheless is a 
good idea to occasionally have a somewhat broader perspective. In the subsequent 
chapters, we shall not make any critical use of the theorems in this section. 


5.5.1 The prime number theorem 

The main theorem in the theory of the density of primes is the following. 

Theorem 5.14 (Prime number theorem). We have 

7t(x) ~ x/ log X. 

Proof. Literature — see §5.6. □ 

As we saw in Exercise 5.10, if jtt{x)/(x/ log x) tends to a limit as x -> oo, then 
the limit must be 1 , so in fact the hard part of proving the prime number theorem 
is to show that ?r(x)/(x/log x) does indeed tend to some limit. 
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Exercise 5.20. Using the prime number theorem, show that d(x) ~ x. 

Exercise 5.21. Using the prime number theorem, show that p n ~ n log «, where 
p n denotes the nth prime. 

Exercise 5.22. Using the prime number theorem, show that Bertrand’s postu- 
late can be strengthened (asymptotically) as follows: for every e > 0, there exist 
positive constants c and xq, such that for all x > xo, we have 

x 

^((1 + e)x) - n(x) > c- . 

logx 


5.5.2 The error term in the prime number theorem 
The prime number theorem says that 

\n(x) - x/logx| < <5(x), 

where <5(x) = o(x/ logx). A natural question is: how small is the “error term” 
<5(x)? It can be shown that 

n{x) = x/logx + 0(x/(logx) 2 ). (5.10) 

This bound on the error term is not very impressive, but unfortunately, cannot 
be improved upon. The problem is that x / log x is not really the best “simple” 
function that approximates nix). It turns out that a better approximation to nix) is 
the logarithmic integral, defined for all real numbers x > 2 as 



It is not hai'd to show (see Exercise 5.6) that 

li(x) = x/logx + 0(x/( logx) 2 ). (5.11) 

Thus, li(x) ~ x/logx ~ 7r(x). However, the error term in the approximation 
of n{x) by li(x) is much better. This is illustrated numerically in Table 5.2; for 
example, at x = 10 18 , li(x) approximates nix) with a relative error just under 
1 (U y , while x/ log x approximates nix) with a relative error of about 0.025. 

The sharpest proven result on the error in approximating nix) by li(x) is the 
following: 

Theorem 5.15. Let x(x) := (logx) 3 / 5 (loglogx)~ 1//5 . Then for some c > 0, we 
have 

n{x) = li(x) + 0(xe ~ CK ( ^). 


Proof. Literature — see §5.6. □ 
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Table 5.2. Values of 7t(x), li(x), and x / log x 


X 

/r(x) 

li(x) 

x/log X 

10 3 

168 

176.6 

144.8 

10 6 

78498 

78626.5 

72382.4 

10 9 

50847534 

50849233.9 

48254942.4 

10 12 

37607912018 

37607950279.8 

36191206825.3 

10 15 

29844570422669 

29844571475286.5 

28952965460216.8 

10 18 

24739954287740860 

24739954309690414.0 

24127471216847323.8 


Note that the error term xe~ CK(x) is o(x / {\ogx) k ) for every fixed k > 0. Also 
note that (5.10) follows directly from (5.11) and Theorem 5.15. 

Although the above estimate on the error term in the approximation of nix) by 
li(x) is pretty good, it is conjectured that the actual error term is much smaller: 

Conjecture 5.16. For all x > 2.01, we have 

\jt{x) - li(x)| < x 1/2 log x. 

Conjecture 5.16 is equivalent to the famous Riemann hypothesis, which is a 
conjecture about the location of the zeros of a certain function, called Riemann’s 
zeta function. We give a very brief, high-level account of this conjecture, and its 
connection to the theory of the distribution of primes. 

For all real numbers s > 1 , the zeta function is defined as 

00 1 

as) := Y - (5.12) 

n s 

n= 1 

Note that because 5 > 1, the infinite series defining £(s) converges. A simple, but 
important, connection between the zeta function and the theory of prime numbers 
is the following: 

Theorem 5.17 (Euler’s identity). For every real number s > 1, we have 

as) = Y[a-p- s r\ (5.13) 

p 

where the product is over all primes p. 

Proof. The rigorous interpretation of the infinite product on the right-hand side 
of (5.13) is as a limit of finite products. Thus, if /;, denotes the /'th prime, for 
i = 1 , 2 ,..., then we are really proving that 

r 

as) = limTTa-ft-r 1 . 

r-^coAi 

1=1 
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Now, from the identity 




e=0 


we have 


J^[(l _ Pj S ) 1 - ( 1 + P\ S + P\ + ’ ’ ’ ) ’ ’ ’ ( 1 + P r S + P r '+'■•) 

h r (n ) 


/'=! 


=z 

n= 1 


where 


K(n ) 


-{ 


1 if n is divisible only by the primes p\,...,p r \ 
0 otherwise. 


Here, we have made use of the fact (see §A7) that we can multiply term- wise 
infinite series with non-negative terms. 

Now, for every e > 0, there exists no such that X^=n 0 n ~ S < £ (because the series 
defining g(s) converges). Moreover, there exists an r o such that h r {n ) = 1 for all 
n < no and r > ro . Therefore, for all r > ro, we have 


It follows that 


h r (n) 


-as) 


n= 1 


sZ 


n < e. 


n=n 0 


r V hr{n) n ^ 

lim A — 7“ = 

r-> oo 

n= 1 


which proves the theorem. □ 


While Theorem 5. 17 is nice, things become much more interesting if one extends 
the domain of definition of the zeta function to the complex plane. For the reader 
who is familial - with just a little complex analysis, it is easy to see that the infinite 
series defining the zeta function in (5.12) converges absolutely for all complex 
numbers s whose real part is greater than 1, and that (5.13) holds as well for such 
s. However, it is possible to extend the domain of definition of f (s) even further — 
in fact, one can extend the definition of £(s) in a “nice way ” (in the language of 
complex analysis, analytically continue) to the entire complex plane (except the 
point s=l, where there is a simple pole). Exactly how this is done is beyond the 
scope of this text, but assuming this extended definition of £(s), we can now state 
the Riemann hypothesis: 
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Conjecture 5.18 (Riemann hypothesis). Suppose s is a complex number with 
s = x + yi, where xjel, such that g(s) = 0 and 0 < x < 1. Then x = 1 /2. 

A lot is known about the zeros of the zeta function in the “critical strip,” which 
consists of those points s whose real part is greater than 0 and less than 1: it is 
known that there arc infinitely many such zeros, and there are even good estimates 
about their density. It turns out that one can apply standard tools in complex analy- 
sis, like contour integration, to the zeta function (and functions derived from it) to 
answer various questions about the distribution of primes. Indeed, such techniques 
may be used to prove the prime number theorem. However, if one assumes the 
Riemann hypothesis, then these techniques yield much sharper results, such as the 
bound in Conjecture 5.16. 


Exercise 5.23. For any arithmetic function a (mapping positive integers to 
reals), we can form the Dirichlet series 


F a (s) 


1 


a{n) 

n s 


For simplicity we assume that s takes only real values, even though such series arc 
usually studied for complex values of s. 

(a) Show that if the Dirichlet series F a (s) converges absolutely for some real 
s, then it converges absolutely for all real s' > s. 

(b) From paid (a), conclude that for any given arithmetic function a, there is 
an interval of absolute convergence of the form (so, oo), where we allow 
so = -oo and so = oo, such that F a (s ) converges absolutely for s > sq, and 
does not converge absolutely for s < so- 

(c) Let a and b be arithmetic functions such that F a (s ) has an interval of abso- 
lute convergence (so, oo) and F b (s) has an interval of absolute conver- 
gence (Sg, oo), and assume that so < oo and s' 0 < oo. Let c := a ★ b 
be the Dirichlet product of a and b, as defined in §2.9. Show that for all 
s e (max(so, s',), oo), the series F c (s) converges absolutely and, moreover, 
that F a (s)F b (s ) = F c (s). 


5.5.3 Explicit estimates 

Sometimes, it is useful to have explicit estimates for n{x), as well as related func- 
tions, like i)(x) and the nth prime function p n . The following theorem presents a 
number of bounds that have been proved without relying on any unproved conjec- 
tures. 



5.5 The prime number theorem . . . and beyond 


121 


Theorem 5.19. We have: 

® i^( 1 + 2l^) <,rW< i^( 1 + 2l^)- 

(ii) «(log n + log log n - 3/2) < p n < «(log n + log log n 


for x > 59; 

-1/2), for n> 20; 


<Ui) *(' “ 2kb) * < *(' + 2l^)' 


■log 

(iv) log log x + A — 


for x > 563; 


1 


2(log x) : 


< ^ l /p < log log x + a + 


l 


p<x 


2(log x) 2 


for x > 286, where A x 0.261497212847643; 


(v) 


logxO 2(logx) 2 ) < n(^ p) < logxl 1 + 2(logx) 2 ) 


2(log x) 

for x > 285, where B\ x, 0.561459483566885. 
Proof. Literature — see §5.6. □ 


5.5.4 Primes in arithmetic progressions 

In Theorems 2.35 and 2.36, we proved that there are infinitely many primes p = 
1 (mod 4) and infinitely many primes p = 3 (mod 4). These results are actually 
special cases of a much more general result. 

Let d be a positive integer, and let a be any integer. An arithmetic progression 
with first term a and common difference d consists of all integers of the form 

a + dm, m = 0, 1, 2, 

The question is: under what conditions does such an arithmetic progression contain 
infinitely many primes? An equivalent formulation is: under what conditions are 
there infinitely many primes p = a (mod d)l If a and d have a common factor 
c > 1, then every term in the progression is divisible by c, and so there can be at 
most one prime in the progression. So a necessary condition for the existence of 
infinitely many primes p = a (mod d) is that gcd(a, d) = 1.A famous theorem due 
to Dirichlet states that this is a sufficient condition as well. 

Theorem 5.20 (Dirichlet ’s theorem). Let a.deZ with d > 0 and gcd(a, d) = I . 
Then there are infinitely many primes p = a (mod d). 

Proof. Literature — see §5.6. □ 

We can also ask about the density of primes in arithmetic progressions. One 
might expect that for a fixed value of d, the primes are distributed in roughly equal 
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measure among the (p(d) different residue classes \a\ c j with gcd(a, d) = I (here, cp 
is Euler’s phi function). This is in fact the case. To formulate such assertions, we 
define jt{x\ d, a ) to be the number of primes p up to x with p = a (mod d). 

Theorem 5.21. Let a,deZ with d > 0 and gcd(n, d) = 1. Then 

x 

n{x\d,d) ~ . 

<p(d) log x 

Proof. Literature — see §5.6. □ 

The above theorem is only applicable in the case where d and a arc fixed as 
x -» oo. For example, it says that roughly half the primes up to x are congruent 
to 1 modulo 4, and roughly half the primes up to x arc congruent to 3 modulo 4. 
However, suppose d -» oo, and we want to estimate, say, the number of primes 
p = 1 (mod d) up to d 3 . Theorem 5.21 does not help us here. The following 
conjecture does, however: 


Conjecture 5.22. Let x e M, a, d e Z with x > 2, d > 2, and gcd(a, d) = 1. Then 


n(x\ d, a) 


li(*) 

<p(d) 


< x ! / 2 (log x + 2 log d). 


The above conjecture is in fact a consequence of a generalization of the Rie- 
mann hypothesis — see §5.6. This conjecture implies that for every constant 
a < 1/2, if 2 < d < x“, then n{x\d,a) is closely approximated by li(x)/<p(d) 
(see Exercise 5.24). It can also be used to get an upper bound on the least prime 
p = a (mod d) (see Exercise 5.25). The following theorem is the best rigorously 
proven upper bound on the smallest prime in an arithmetic progression: 


Theorem 5.23. There exists a constant c such that for all a, d e Z with d >2 and 
gcd (n, d) = 1, the least prime p = a (mod d) is at most cd 11//2 . 


Proof. Literature — see §5.6. □ 


Exercise 5.24. Assuming Conjecture 5.22, show that for all a,e satisfying 
0 < a < 1/2 and 0 < s < 1, there exists an xo, such that for all x > xo, for 
all d e Z with 2 < d < x“, and for all a e Z relatively prime to d, the number of 
primes p < x such that p = a (mod d) is at least (1 - e) li(x)/<p(d) and at most 
(1 + e) li(x)/ cp(d). 

Exercise 5.25. Assuming Conjecture 5.22, show that there exists a constant 
c such that for all a,de Z with d > 2 and gcd(n, d) = 1, the least prime 
p = a (mod d) is at most ccp(d ) 2 { log d) 4 . 
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5.5.5 Sophie Germain primes 

A Sophie Germain prime is a prime p such that 2p + 1 is also prime. Such primes 
arc actually useful in a number of practical applications, and so we discuss them 
briefly here. 

It is an open problem to prove (or disprove) that there arc infinitely many Sophie 
Germain primes. However, numerical evidence, and heuristic arguments, strongly 
suggest not only that there arc infinitely many such primes, but also a fairly precise 
estimate on the density of such primes. 

Let 7r*(x) denote the number of Sophie Germain primes up to x. 


Conjecture 5.24. We have 


7T*(x) ~ C 


(log X) 


2 ’ 


where C is the constant 


c = 2 n 

P> 2 


P(P ~ 2) 


1.32032, 


(P ~ l) 2 

and the product is over all primes p > 2. 

The above conjecture is a special case of the following, more general conjecture. 


Conjecture 5.25 (Dickson’s conjecture). Let ( ai,b\),...,(ak,bk ) be distinct 
pairs of integers, where each a, is positive. Let P(x) be the number of positive 
integers m up to x such that a t m + b, are simultaneously prime for i = !,...,/<. 
For each prime p, let co{p) be the number of integers me {0 ,p — 1} that satisfy 

k 

j"J(a,m + bj) = 0 (mod p). 

i=t 


If o)( p) < p for each prime p, then 


where 


P(*)~ 


D 


x 

(log x) k ’ 


":-n 

p 


1 - CQ(p)/p 
(1 - 1 /p) k ’ 


the product being over all primes p. 


In Exercise 5.26 below, you are asked to verify that the quantity D appealing 
in Conjecture 5.25 satisfies 0 < D < oo. Conjecture 5.24 is implied by Con- 
jecture 5.25 with k := 2, (, a\,b\ ) := (1,0), and (^ 2 ,^ 2 ) : = (2,1); in this case, 
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mil) = 1 and co( p) = 2 for all p > 2. The above conjecture also includes (a strong 
version of) the famous twin primes conjecture as a special case: the number of 
primes p up to x such that p + 2 is also prime is ~ Cx/(logx) 2 , where C is the 
same constant as in Conjecture 5.24. 

A heuristic argument in favor of Conjecture 5.25 runs as follows. In some 
sense, the chance that a large positive integer m is prime is about 1 / log m. Since 
log + bi ) ~ log m, the chance that a\m + b\, . . . , atm + bk are all prime should 
be about l/(logm)*. But this ignores the fact that a\tn + b\,...,aktn + bk are 
not quite random integers. For each prime p, we must apply a “correction factor” 
r p /s p , where r p is the chance that for random m, none of ci\m + b\,.. . , a/ ( m + bk is 
divisible by /;, and s p is the chance that for k truly random, large integers, none of 
them is divisible by p. One sees that r p = I — co{p)/p and s p = (1 - 1 / p) k . This 
implies (using §A5 and Exercise 5.6) that P(x ) should be about 

px 

D V l/(logm) fc ~ D dt /(log t) k ~ Dx/(logx)*. 

^ 9 

m<x 

Although Conjecture 5.25 is well supported by numerical evidence, there seems 
little hope of it being proved any time soon, even under the Riemann hypothesis or 
any of its generalizations. 


Exercise 5.26. Show that the quantity D appealing in Conjecture 5.25 satisfies 
0 < D < oo. Hint: first show that m(p) = k for all sufficiently large p. 

Exercise 5.27. Derive Theorem 5.21 from Conjecture 5.25. 

Exercise 5.28. Show that the constant C appealing in Conjecture 5.24 satisfies 

2C = B 2 /B], 

where B\ and B 2 are the constants from Exercises 5.12 and 5.13. 


5.6 Notes 

The prime number theorem was conjectured by Gauss in 1791. It was proven 
independently in 1 896 by Hadamard and de la Vallee Poussin. A proof of the prime 
number theorem may be found, for example, in the book by Hardy and Wright [46]. 

Theorem 5. 19, as well as the estimates for the constants A, B i , and B 2 mentioned 
in that theorem and Exercises 5.11, 5.12, and5.13, arc from Rosser and Schoenfeld 
[S3], 

Theorem 5.15 is from Walfisz [102]. 
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Theorem 5.17, which made the first connection between the theory of prime 
numbers and the zeta function, was discovered in the 1 8th century by Euler. The 
Riemann hypothesis was made by Riemann in 1859, and to this day, remains one 
of the most vexing conjectures in mathematics. Riemann in fact showed that his 
conjecture about the zeros of the zeta function is equivalent to the conjecture that 
for each fixed e > 0, n{x) = li(x) + Ofx 1 / 2+£ ). This was strengthened by von 
Koch in 1901, who showed that the Riemann hypothesis is true if and only if 
nix) - li(x)+0(x 1//2 log x). See Chapter 1 of the book by Crandall and Pomerance 
[30] for more on the connection between the Riemann hypothesis and the theory 
of prime numbers; in particular, see Exercise 1.36 in that book for an outline of a 
proof that Conjecture 5.16 follows from the Riemann hypothesis. 

A warning: some authors (and software packages) define the logarithmic inte- 
gral using the interval of integration (0, x), rather than (2, x), which increases its 
value by a constant c « 1.0452. 

Theorem 5.20 was proved by Dirichlet in 1837, while Theorem 5.21 was proved 
by de la Vallee Poussin in 1896. A result of Oesterle [73] implies that Conjec- 
ture 5.22 for d > 3 is a consequence of an assumption about the location of the 
zeros of certain generalizations of Riemann’s zeta function; the case d = 2 follows 
from the bound in Conjecture 5.16 under the ordinary Riemann hypothesis. Theo- 
rem 5.23 is from Heath-Brown [47]. The bound in Exercise 5.25 can be improved 
to ccp(d ) 2 ( log d) 2 (see Theorem 8.5.8 of [1 1]). 

Conjecture 5.25 originates from Dickson [33]. In fact, Dickson only conjectured 
that the quantity P(x) defined in Conjecture 5.25 tends to infinity. The conjectured 
formula for the rate of growth of P(x) is a special case of a more general conjec- 
ture stated by Bateman and Horn [12], which generalizes various, more specific 
conjectures stated by Hardy and Littlewood [45]. 

For the reader who is interested in learning more on the topics discussed in this 
chapter, we recommend the books by Apostol [8] and Hardy and Wright [46]; 
indeed, many of the proofs presented in this chapter arc minor variations on proofs 
from these two books. Our proof of Bertrand’s postulate is based on the presen- 
tation in Section 9.2 of Redmond [80]. See also Bach and Shallit [11] (especially 
Chapter 8), as well as Crandall and Pomerance [30] (especially Chapter 1), for a 
more detailed overview of these topics. 

The data in Tables 5.1 and 5.2 was obtained using the computer program Maple. 



6 

Abelian groups 


This chapter introduces the notion of an abelian group. This is an abstraction that 
models many different algebraic structures, and yet despite the level of generality, 
a number of very useful results can be easily obtained. 


6.1 Definitions, basic properties, and examples 

Definition 6.1. An abelian group is a set G together with a binary operation * on 
G such that: 

(i) for all a, b, c e G, a * (b * c) = (a * b) * c (i.e., * is associative); 

(ii) there exists e e G (called the identity element) such that for all a e G, 
a*e = a = e*a; 

(iii) for all a e G there exists a' e G (called the inverse of a) such that 
a ★ a’ = e = a’ * a; 

(iv) for all a, b e G, a * b = b * a (i.e., * is commutative). 

While there is a more general notion of a group, which may be defined simply 
by dropping property (iv) in Definition 6.1, we shall not need this notion in this 
text. The restriction to abelian groups helps to simplify the discussion significantly. 
Because we will only be dealing with abelian groups, we may occasionally simply 
say “group” instead of “abelian group.” 

Before looking at examples, let us state some very basic properties of abelian 
groups that follow directly from the definition: 

Theorem 6.2. Let G be an abelian group with binary operation ★. Then we have: 

(i) G contains only one identity element; 

(ii) every element of G has only one inverse. 
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Proof. Suppose e, e' arc both identities. Then we have 

e = e * e' — e' , 

where we have used paid (ii) of Definition 6.1, once with e' as the identity, and 
once with e as the identity. That proves part (i) of the theorem. 

To prove paid (ii) of the theorem, let a e G, and suppose that a has two inverses, 
a' and a". Then using parts (i)-(iii) of Definition 6.1, we have 

a’ = a’ * e (by part (ii)) 

= «'*(«★ a") (by part (iii) with inverse a" of a) 

= ( a ' * a) * a" (by part (i)) 

= e * a" (by paid (iii) with inverse a ' of a ) 

= a" (by paid (ii)). □ 

These uniqueness properties justify use of the definite article in Definition 6.1 
in conjunction with the terms “identity element” and “inverse.” Note that we never 
used paid (iv) of the definition in the proof of the above theorem. 

Abelian groups arc lurking everywhere, as the following examples illustrate. 

Example 6.1. The set of integers Z under addition forms an abelian group, with 0 
being the identity, and —a being the inverse of a e Z. □ 

Example 6.2. For each integer n, the set nZ = {nz : z € Z} under addition forms 
an abelian group, again, with 0 being the identity, and n(—z ) being the inverse of 
nz. □ 

Example 6.3. The set of non-negative integers under addition does not form an 
abelian group, since additive inverses do not exist for any positive integers. □ 

Example 6.4. The set of integers under multiplication does not form an abelian 
group, since inverses do not exist for any integers other than ±1. □ 

Example 6.5. The set of integers {±1} under multiplication forms an abelian 
group, with 1 being the identity, and -1 its own inverse. □ 

Example 6.6. The set of rational numbers © = {a/b : a, b e Z, b 0} under 
addition forms an abelian group, with 0 being the identity, and {—a)/b being the 
inverse of a/b. □ 

Example 6. 7. The set of non-zero rational numbers Q* under multiplication forms 
an abelian group, with 1 being the identity, and b/a being the inverse of a/b. □ 

Example 6.8. The set Z„ under addition forms an abelian group, where [0]„ is the 
identity, and where {—a\ n is the inverse of \a\„. □ 
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Example 6.9. The set Z* of residue classes \a\ n with gcd(n, n) = 1 under multipli- 
cation forms an abelian group, where [1]„ is the identity, and if b is a multiplicative 
inverse of a modulo n, then [ b\„ is the inverse of \a\ n . □ 

Example 6.10. For every positive integer n, the set of n-bit strings under the 
“exclusive or” operation forms an abelian group, where the “all zero” bit string 
is the identity, and every bit string is its own inverse. □ 

Example 6.11. The set F* of all arithmetic functions /, such that /( 1) ^ 0, and 
with the Dirichlet product as the binary operation (see §2.9) forms an abelian 
group. The special function I is the identity, and inverses are guaranteed by 
Exercise 2.54. □ 

Example 6.12. The set of all finite bit strings under concatenation does not form 
an abelian group. Although concatenation is associative and the empty string acts 
as an identity element, inverses do not exist (except for the empty string), nor is 
concatenation commutative. □ 

Example 6.13. The set of 2 x 2 integer matrices with determinant ±1, together 
with the binary operation of matrix multiplication, is an example of a non-abelian 
group; that is, it satisfies properties (i)-(iii) of Definition 6.1, but not property 
(iv). □ 

Example 6.14. The set of all permutations on a given set of size n > 3, together 
with the binary operation of function composition, is another example of a non- 
abelian group (for n = 1,2, it is an abelian group). □ 

Consider an abelian group G with binary operation *. Since the group operation 
is associative, for all a\, . . . , a^ e G, we may write a\ *- ■ ■ *a^ without parentheses, 
and there can be no ambiguity as to the value of such an expression: any explicit 
parenthesization of this expression yields the same value. Furthermore, since the 
group operation is commutative, reordering the n, ’s does not change this value. 

Note that in specifying a group, one must specify both the underlying set G as 
well as the binary operation; however, in practice, the binary operation is often 
implicit from context, and by abuse of notation, one often refers to G itself as the 
group. For example, when talking about the abelian groups Z and Z„, it is under- 
stood that the group operation is addition, while when talking about the abelian 
group Z*, it is understood that the group operation is multiplication. 

Typically, instead of using a special symbol like for the group operation, one 
uses the usual addition (“+”) or multiplication (“•”) operations. 

Additive notation. If an abelian group G is written additively, using “+” as 
the group operation, then the identity element is denoted by ()<-, (or just 0 if G is 
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clear from context), and is also called the zero element. The inverse of an element 
a e G is denoted by —a. For a, b e G, a — b denotes a + (- b ). 

Multiplicative notation. If an abelian group G is written multiplicatively, using 
as the group operation, then the identity element is denoted by I G (or just 1 if 
G is clear from context). The inverse of an element a e G is denoted by a -1 . As 
usual, one may write ab in place of a ■ b. Also, one may write a/b for ab~ l . 

For any particular, concrete abelian group, the most natural choice of notation is 
clear (e.g., addition for Z and Z„, multiplication for Z *); however, for a “generic” 
group, the choice is largely a matter of taste. By convention, whenever we con- 
sider a “generic” abelian group, we shall use additive notation for the group 
operation, unless otherwise specified. 

The next theorem states a few simple but useful properties of abelian groups 
(stated using our default, additive notation). 

Theorem 6.3. Let G be an abelian group. Then for all a,b,c € G, we have: 

(i) if a + b = a + c, then b = c; 

(ii) the equation a + x = b has a unique solution x e G; 

(in) -(a + b) = (-a) + (—b); 

(iv) -(-a) = a. 

Proof. These statements all follow easily from Definition 6.1 and Theorem 6.2. 
For (i), just add —a to both sides of the equation a + b = a + c. For (ii), the solution 
is x = b - a. For (iii), we have 

(a + b) + ((-a) + (- b )) = (a + (-a)) + (b + (- b )) = 0 G + 0 G = 0 G , 

which shows that (—a) + (- b ) is indeed the inverse of a + b. For (iv), we have 
(—a) + a = 0 G , which means that a is the inverse of —a. □ 

Part (i) of the above theorem is the cancellation law for abelian groups. 

If a\, . . . , ak are elements of an abelian group G, we naturally write Xf=i a i f° r 
their sum a\ + ■ ■ ■ + a^. By convention, the sum is 0 G when k = 0. Paid (iii) of 
Theorem 6.3 obviously generalizes, so that - X/=i a > = X!=i In the special 

case where all the a,’s have the same value a , we define k ■ a := Xf=i a > whose 
inverse is k ■ (—a), which we may write as (—k) ■ a. Thus, the notation k ■ a , or 
more simply, ka, is defined for all integers k. Observe that by definition, 1 a = a 
and (-l)n = —a. 

Theorem 6.4. Let G be an abelian group. Then for all a,b e G and k,£ e Z, we 
have: 


(i) k(£a) = (kt)a = t(ka); 
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(ii) (k + i)a = ka + [a; 

(iii) k(a + b) = ka + kb. 

Proof. The proof of this is easy, but tedious. We leave the details as an exercise to 
the reader. □ 

Multiplicative notation: It is perhaps helpful to translate the above discussion 
from additive to multiplicative notation. If a group G is written using multi- 
plicative notation, then Theorem 6.3 says that (i) ab = ac implies b = c, (ii) 
ax = b has a unique solution, (iii) ( ab)~ l = a~ l b~ l , and (iv) (a -1 ) -1 = a. If 
a \, . . . , a k € G, we write their product a\ ■ ■ ■ a k as JJ* =1 a f , which is lg when 
k = 0. We have (Ilf=i a ;) _1 = II^i 0 ?" 1, We a ^ so define a k := JJ^ =1 a, and 
we have ( a k )~ l = (a~ ] ) k , which we may write as a~ k . Theorem 6.4 says that (i) 
(, a l ) k = a u = ( a k f , (ii) a k+l = a k a e , and (iii) ( ab) k = a k b k . 

An abelian group G may be trivial, meaning that it consists of just the zero 
element 0 g, with (V, + ()<-, = 0 g- An abelian group G may be infinite or finite: if the 
group is finite, we define its order to be the number of elements in the underlying 
set G\ otherwise, we say that the group has infinite order. 

Example 6.15. The order of the additive group Z„ is n. If n = 1, then Z„ is the 
trivial group. □ 

Example 6.16. The order of the multiplicative group Z* is (pin), where cp is Euler’s 
phi function, defined in §2.6. □ 

Example 6.17. The additive group Z has infinite order. □ 

We close this section with two simple constructions for combining groups to 
build new groups. 

Example 6.18. If G\, . . . , Gk arc abelian groups, we can form the direct product 
H := Gi x • • • x Gk, which consists of all k -tuples (au . . . , ak) with a\ e G\, 
. . . , ak e Gk- We can view H in a natural way as an abelian group if we define the 
group operation component- wise: 

(au . . . , a k ) + (b\ b k ) := (a% + b\, . . . , a k + b k ). 

Of course, the groups G\,...,G k may be different, and the group operation applied 
in the j'th component corresponds to the group operation associated with G,. We 
leave it to the reader to verify that H is in fact an abelian group, where 0 h = 

(0g 15 0g^) and -(a\,...,a k ) = (— fli, . . . , —a k ). As a special case, if G = 

G\ = ■ ■ • = G k , then the /c-wise direct product of G is denoted G xk . □ 
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Example 6.19. Let G be an abelian group. An element (a\, , a/ ( ) of G xk may be 
identified with the function / : { 1, . . . , k } -*■ G given by / (;') = a t for i = 1 ,... ,k. 
We can generalize this, replacing { I } by an arbitrary set I. We define 
Map(/, G) to be the set of all functions / : I -» G, which we naturally view 
as a group by defining the group operation point-wise: for f.ge Map(/, G), we 
define 

(/ + g)(0 := fO) + g(i) for all i e I. 

Again, we leave it to the reader to verify that Map(/, G) is an abelian group, 
where the identity element is the function that maps each i e I to Of,-, and for 
/ e Map(/, G ), we have (-/)(/) = — (/ (/)) for all i e I. □ 

Exercise 6.1. For a finite abelian group, one can completely specify the group 
by writing down the group operation table. For instance. Example 2.7 presented an 
addition table for Zg. 

(a) Write down group operation tables for the following finite abelian groups: 

Z 5 ,Z*, and Z 3 x Z*. 

(b) Show that the group operation table for every finite abelian group is a Latin 
square; that is, each element of the group appears exactly once in each row 
and column. 

(c) Below is an addition table for an abelian group that consists of the elements 
{a, b, c, d \ ; however, some entries are missing. Fill in the missing entries. 


+ 

abed 

a 

a 

b 

b a 

c 

a 

d 



Exercise 6.2. Fet G := {x e M : x > 1}, and define a * b := ab — a — b + 2 for 
all a,/)£l Show that: 

(a) G is closed under *; 

(b) the set G under the operation * forms an abelian group. 

Exercise 6.3. Fet G be an abelian group, and let g be an arbitrary, fixed element 
of G. Assume that the group operation of G is written additively. We define a new 
binary operation O on G, as follows: for a, b e G, let a © b := a + b + g. Show that 
the set G under © forms an abelian group. 

Exercise 6.4. Let G be a finite abelian group of even order. Show that there 
exists a e G with a ^ Og and 2a = Og- 
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Exercise 6.5. Let ★ be a binary operation on a non-empty, finite set G. Assume 
that * is associative, commutative, and satisfies the cancellation law: a ★ b = a * c 
implies b = c. Show that G under * forms an abelian group. 

Exercise 6.6. Show that the result of the previous exercise need not hold if G is 
infinite. 


6.2 Subgroups 

We next introduce the notion of a subgroup. 

Definition 6.5. Let G be an abelian group, and let H be a non-empty subset of G 
such that 

(i) a + b e H for all a,beH, and 

(ii) -a e H for all a e H. 

Then H is called a subgroup of G. 

In words: // is a subgroup of G if it is closed under the group operation and 
taking inverses. 

Multiplicative notation: if the abelian group G in the above definition is written 
using multiplicative notation, then H is a subgroup if ah e if and a~ l e H for all 
a,b e H. 

Theorem 6.6. If G is an abelian group, and H is a subgroup of G, then H 
contains 0 f , ; moreover, the binary operation of G, when restricted to H, yields 
a binary operation that makes H into an abelian group whose identity is Or, . 

Proof First, to see that Og £ H, just pick any a e //, and using both properties of 
the definition of a subgroup, we see that 0 G = a + (—a) e H. 

Next, note that by property (i) of Definition 6.5, H is closed under addition, 
which means that the restriction of the binary operation “+” on G to // induces a 
well-defined binary operation on H. So now it suffices to show that H, together 
with this operation, satisfies the defining properties of an abelian group. Associa- 
tivity and commutativity follow directly from the corresponding properties for G. 
Since Og acts as the identity on G, it does so on H as well. Finally, property (ii) of 
Definition 6.5 guarantees that every element a e H has an inverse in H, namely, 
—a. □ 

Clearly, for an abelian group G, the subsets G and {Og} are subgroups, though 
not very interesting ones. Other, more interesting subgroups may sometimes be 
found by using the following two theorems. 
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Theorem 6.7. Let G be an abelian group, and let m be an integer. Then 

mG := {ma : a e G} 

is a subgroup of G. 

Proof. The set mG is non-empty, since (V, = m()(, e mG. For ma, mb e mG, we 
have ma + mb = m(a + b) e mG, and —(ma) = m(—a) e mG. □ 

Theorem 6.8. Let G be an abelian group, and let m be an integer. Then 

G{m } := [a e G : ma = Og} 

is a subgroup of G. 

Proof. The set G{m] is non-empty, since mOo = 0 q, and so G{m] contains Og- 
If ma = 0 g and mb = 0 g, then m(a + b) = ma + mb = Og + Og = Og and 
m(-a) = -(ma) = -Og = Og- □ 

Multiplicative notation: if the abelian group G in the above two theorems is 
written using multiplicative notation, then we write the subgroup of the first theo- 
rem as G m := (a m : a e G} . The subgroup in the second theorem is denoted in the 
same way: G{m] := [a e G : a m = 1 q] . 

Example 6.20. We already proved that (Z*) m is a subgroup of Z* in Theorem 2. 16. 
Also, the proof of Theorem 2.17 clearly works for an arbitrary abelian group G: 
for each a e G, and all (, m e Z with gcd(£, m) = 1, if £a e mG, then a e mG. □ 

Example 6.21. Let p be an odd prime. Then by Theorem 2.20, (Z *) 2 is a subgroup 
of Z* of order (p — l)/2, and as we saw in Theorem 2.18, Z*{2} = {[±1]}. □ 

Example 6.22. For every integer m, the set mZ is the subgroup of the additive group 
Z consisting of all multiples of m. This is the same as the ideal of Z generated by 
m, which we already studied in some detail in §1.2. Two such subgroups mZ and 
m'Z are equal if and only if m = ±m' . The subgroup Z {m) is equal to Z if m = 0, 
and is equal to {0} otherwise. □ 

Example 6.23. Let n be a positive integer, let m e Z, and consider the subgroup 
mZ„ of the additive group Z„. Now, for every residue class |z| e Z„, we have 
m[z\ = \mz\. Therefore, \b\ e mZ„ if and only if there exists z e Z such that 
mz. = b (mod n). By part (i) of Theorem 2.5, such a z exists if and only if d \ b, 
where d := gcd (m,n). Thus, wjZ„ consists precisely of the n/d distinct residue 
classes 

| i ■ d] (i = 0 n/d - 1), 


and in particular, mZ„ = r/Z„ . 



134 


Abelian groups 


Now consider the subgroup Z „{m} of Z„. The residue class \z\ is in Z„ { m } if 
and only if mz = 0 (mod n). By part (ii) of Theorem 2.5, this happens if and only if 
2 = 0 (mod n/d), where d := gcd(m, n) as above. Thus, Z n {m) consists precisely 
of the d residue classes 


[/ • n/d] (;' = 0 d — 1), 

and in particular, Z„{m} = Z n {d} = {n/d) Z n . □ 

Example 6.24. For n = 15, consider again the table in Example 2.2. For m = 1, 
2, 3, 4, 5, 6, the elements appearing in the mth row of that table form the subgroup 
mZ n of Z„, and also the subgroup Z„ { n/d} , where d := gcd(m, n). □ 

Because the abelian groups Z and Z n are of such importance, it is a good idea 
to completely characterize all subgroups of these abelian groups. As the following 
two theorems show, the subgroups in Examples 6.22 and 6.23 arc the only ones. 

Theorem 6.9. If G is a subgroup of Z, then there exists a unique non-negative 
integer m such that G = mZ. Moreover, for two non-negative integers mi and m 2 , 
we have m 1 Z C m 2 Z if and only if m 2 \m\. 

Proof. Actually, we have already proven this. One only needs to observe that a 
subset G of Z is a subgroup if and only if it is an ideal of Z, as defined in §1.2 
(see Exercise 1.8). The first statement of the theorem then follows from Theo- 
rem 1.6. The second statement follows easily from the definitions, as was observed 
in §1.2. □ 

Theorem 6.10. If G is a subgroup of Z„, then there exists a unique positive integer 
d dividing n such that G = dZ n . Also, for all positive divisors d\, dj of n, we have 
d\Z n C d 2 Z„ if and only if cA I d\. 

Proof. Note that the second statement implies the uniqueness part of the first state- 
ment, so it suffices to prove just the existence part of the first statement and the 
second statement. 

Let G be an arbitrary subgroup of Z„, and let if := [z e Z : [z] e G}. We 
claim that H is a subgroup of Z. To see this, observe that if a.be H, then [a] and 
[ b ] belong to G, and hence so do [a + b] = [a] + [ft] and [-n] = —[a], and thus 
a + b and —a belong to H. That proves the claim, and Theorem 6.9 implies that 
// = c/Z for some non-negative integer d. It follows that 

G = {[y] : y e H) = {[dz\ : z e Z} = dZ n . 

Evidently, n e H = dZ, and hence d \ n. That proves the existence part of the first 
statement of the theorem. 



6.2 Subgroups 135 

To prove the second statement of the theorem, observe that if d\ and d 2 arc 
arbitrary integers, then 

d\Z n C r/ 2 Z„ <t=> diZ - d\ (mod n) for some z e Z 

gcd(d 2 , n) | d\ (by part (i) of Theorem 2.5). 

In particular, if d 2 is a positive divisor of n. then gcd(d 2 , n) = d 2 , which proves the 
second statement. □ 

Of course, not all abelian groups have such a simple subgroup structure. 

Example 6.25. Consider the group G = x For every non- zero a e G, 
a + a = 0 g- From this, it is clear that the set H = {Og, a] is a subgroup of G. 
However, for every integer m, mG = G if m is odd, and mG = {Og} if m is even. 
Thus, the subgroup H is not of the form mG for any m. □ 

Example 6.26. Consider the group Zp. We can enumerate its elements as 

[±1], [±2], [±4], [±7], 

Therefore, the elements of (Z* 5 ) 2 are 

[l] 2 = [1], [2] 2 = [4], [4] 2 = [16] = [1], [7] 2 = [49] = [4]; 

thus, (Z* 5 ) 2 has order 2, consisting as it does of the two distinct elements [1] and 
[4]. 

Going further, one sees that (Zp) 4 = {[1]}. Thus, a 4 = [1] for all a e Z* 5 . 

By direct calculation, one can determine that (Z* 5 f = Z* 5 ; that is, cubing sim- 
ply permutes Z* 5 . 

For any given integer m, write m = 4q + r, where 0 < r < 4. Then for every 
a e Z* 5 , we have a m = a 4 " +r = a 4 V = a r . Thus, (Z* 5 ) m is either Z* 5 , (Z* 5 ) 2 , or 
{[!]}• 

However, there arc certainly other subgroups of Z\. — for example, the subgroup 
{[± 1 ]}- □ 

Example 6.27. Consider the group Z* = {[±1], [±2]}. The elements of (Zp 2 are 

[l] 2 = [1], [2] 2 = [4] = [-1]; 
thus, (Z*) 2 = {[±1]} and has order 2. 

There are in fact no other subgroups of Z'( besides Zp {[±1]}, and {[1]}. 
Indeed, if H is a subgroup containing [2], then we must have H = Zp [ 2] e H 
implies [2] 2 = [4] = [—1] e H, which implies [-2] e H as well. The same holds 
if H is a subgroup containing [-2]. □ 
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Example 6.28. Consider again the abelian group F* of arithmetic functions /, 
such that /( 1) ^ 0, and with the Dirichlet product as the binary operation, as 
discussed in Example 6.11. Exercises 2.48 and 2.55 imply that the subset of all 
multiplicative functions is a subgroup. □ 

We close this section with two theorems that provide useful ways to build new 
subgroups out of old ones. 

Theorem 6.11. If H i and Hi are subgroups of an abelian group G, then so is 

H i + Hi '■= {«i + ai \ a\ e Hi, ai 6 Hi). 

Proof. It is evident that II \ + Hi is non-empty, as it contains 0 g + 0g = 0g- 
Consider two elements in H\ + Hi, which we can write as a\ + ai and b\ + bi, 
where a\,b\ £ H\ and ai, 1)2 £ // 2 • Then by the closure properties of subgroups, 
a\+b\ e H 1 and 02+62 £ Hi, and hence {a\+a 2 )+{b\+bi) = la\+b\)-\-{ai+bi) e 
Hi + Hi. Similarly, -(oi + 02 ) = (-«i) + (-02) e Hi + Hi. □ 

Multiplicative notation: if the abelian group G in the above theorem is written 
multiplicatively, then the subgroup defined in the theorem is written If Hi '■= 
{a\a2 : 01 £ H\, 02 £ H 2 }. 

Theorem 6.12. If Hi and II 2 are subgroups of an abelian group G, then so is 

Hi n h 2 . 

Proof. It is evident that If n If is non-empty, as both II 1 and If contain 0 g, 
and hence so does their intersection. If a e If n If and h e If n If, then 
since a,be Hi, we have a + b e Hi, and since a, b e H 2 , we have a + b e Hg, 
therefore, a + b e If n If. Similarly, —a e II 1 and —a e If, and therefore, 
-a e Hi n H 2 . □ 

Let G be an abelian group and If , If, IP, subgroups of G. The reader may 
verify that If + If_ = H 2 + Hi and (//, + Iff + If = If + (II 2 + If). It follows 
that if If, ... , If are subgroups of G, then we can write If + ■ • • + If without 
any parentheses, and there can be no ambiguity; moreover, the order of the Iff 
does not matter. The same holds with “+” replaced by “n.” 

A warning: If H is a subgroup of an abelian group G, then in general, we have 
H + H f- 2 H. For example, Z + Z = Z, while 2Z Z. 


Exercise 6.1 . Let G be an abelian group. 

(a) Suppose that H is a non-empty subset of G. Show that H is a subgroup of 
G if and only if a - b £ H for all a, b e H. 
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(b) Suppose that 77 is a non-empty, finite subset of G such that a + b e 77 for 
all a,b € H. Show that 77 is a subgroup of G. 

Exercise 6.8 . Let G be an abelian group. 

(a) Show that if 77 is a subgroup of G, h e 77, and g e G \ H, then 
h + g e G \ 77. 

(b) Suppose that 77 is a non-empty subset of G such that for all h,geG: (i) 
h e 77 implies —h e 77, and (ii) h e 77 and g e G\H implies h+g e G\H. 
Show that 77 is a subgroup of G. 

Exercise 6.9 . Show that if 77 is a subgroup of an abelian group G, then a set 
K C H is a subgroup of G if and only if K is a subgroup of 77. 

Exercise 6.10 . Let G be an abelian group with subgroups // \ and Hi Show 
that every subgroup 77 of G that contains II \ U Hi must contain all of II] + IG, 
and that Hi C Ih if and only if I1\ + Hi = Hi. 

Exercise 6. 1 1 . Let II \ be a subgroup of an abelian group G\ and Hi a subgroup 
of an abelian group Gj. Show that II \ x Ih is a subgroup of G\ x Gi. 

Exercise 6.12 . Show that if G\ and Gi arc abelian groups, and m is an integer, 
then m(G\ x Gi) = mG\ x mGi. 

Exercise 6.13. Let G\ and G 2 be abelian groups, and let H be a subgroup of 
G[ x 62 - Define 

H 1 := {a\ e G\ : (.a\,ai) e H for some ai_ e G 2 }. 

Show that H 1 is a subgroup of G\. 

Exercise 6.14. Let I be a set and G be an abelian group, and consider the 
group Map(7, G) of functions / : I -* G. Let Map # (/, G) be the set of functions 
/ e Map(7, G) such that / (i) ^ 0g for at most finitely many i e I. Show that 
Map # (7, G) is a subgroup of Map(7, G). 


6.3 Cosets and quotient groups 

We now generalize the notion of a congruence relation. 

Let G be an abelian group, and let 77 be a subgroup of G. For a,b e G, we 
write a = b (mod 77) if a — b e 77. In other words, a = b (mod 77) if and only if 
a = b + h for some h e 77. 

Analogous to Theorem 2.2, if we view the subgroup 77 as fixed, then the fol- 
lowing theorem says that the binary relation “• = • (mod 77)” is an equivalence 
relation on the set G: 
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Theorem 6.13. Let G be an abelian group and H a subgroup of G. For all 
a,b,ce G, we have: 

(i) a = a (mod H); 

(ii) a = b (mod II) implies b = a (mod II): 

(iii) a = b (mod II) and b = c (mod II) implies a = c (mod II). 

Proof. For (i), observe that FI contains ()<-, = a — a. For (ii), observe that if H 
contains a — b. then it also contains —(a — b) = b — a. For (iii), observe that if H 
contains a — b and b — c, then it also contains (a — b) + (b — c) = a — c. □ 

Since the binary relation “• = • (mod H)” is an equivalence relation, it parti- 
tions G into equivalence classes (see Theorem 2.1). For a e G, we denote the 
equivalence class containing a by \a]n ■ By definition, we have 

x 6 [a]n ■£=> x = a (mod H) <=> x = a + h for some h e H, 

and hence 

\o\h = a + H := {a + h : h e H}. 

It is also clear that [Oelir = H. 

Historically, these equivalence classes arc called cosets of H in G, and we shall 
adopt this terminology here as well. Any member of a coset is called a represen- 
tative of the coset. 

Multiplicative notation: if G is written multiplicatively, then a = b (mod II) 
means ab~ l e H, and \o]h = aH := {ah : h e H }. 

Example 6.29. Let G := Z and II := nZ for some positive integer n. Then 
a = b (mod H) if and only if a = b (mod n). The coset [o]h is exactly the same 
thing as the residue class [a] n e 7L n . □ 

Example 6.30. Let G := Zg, which consists of the residue classes [0], [1], [2], [3], 
[4], [5]. Let H be the subgroup 3 G = {[0], [3]} of G. The coset of H containing 
the residue class [1] is [1] + H = {[1], [4]}, and the coset of H containing the 
residue class [2] is [2] + H = {[2], [5]}. The cosets { [0], [3] } , { [1], [4] } , and 
{ [2], [5] } arc the only cosets of H in G, and they clearly partition the set Z(,. Note 
that each coset of H in G contains two elements, each of which is itself a coset of 
6Z in Z (i.e., a residue classes modulo 6). □ 

In the previous example, we saw that each coset contained the same number of 
elements. As the next theorem shows, this was no accident. 
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Theorem 6.14. Let G be an abelian group and H a subgroup of G. For all 
a,beG, the function 

f: G^G 

x \-> b — a + x 

is a bijection, which, when restricted to the coset \a\u, yields a bijection from 
\a\n to the coset [ b ] h . In particular, every two cosets of H in G have the same 
cardinality. 

Proof. First, we claim that / is a bijection. Indeed, if /(x) = f{x'), then 
b — a + x = b- a + x', and subtracting b and adding a to both sides of this equation 
yields x = x' . That proves that / is injective. To prove that / is surjective, observe 
that for any given x! e G, we have f (a — b + x') = x' . 

Second, we claim that for all x e G, we have x e [a]n if and only if / (x) e[b]n- 
On the one hand, suppose that x e [a] //, which means that x = a+h for some h e H. 
Subtracting a and adding b to both sides of this equation yields b — a + x = b + h, 
which means /(x) e [ b ] /-/ . Conversely, suppose that f{x) e [/)]//, which means 
that b — a + x = b + h for some h e H. Subtracting b and adding a to both sides of 
this equation yields x = a + h, which means that x e [a]u- 
The theorem is now immediate from these two claims. □ 

An incredibly useful consequence of the above theorem is: 

Theorem 6.15 (Lagrange’s theorem). If G is a Unite abelian group, and H is a 
subgroup of G, then the order of H divides the order of G. 

Proof. This is an immediate consequence of the previous theorem, and the fact that 
the cosets of H in G partition G. □ 

Analogous to Theorem 2.3, we have: 

Theorem 6.16. Suppose G is an abelian group and H is a subgroup of G. For 
all a, a', b, b' e G, if a = a! (mod H ) and b = b' (mod H ), then we have 
a + b = a' + b' (mod H). 

Proof. Now, a = a' (mod H ) and b = b’ (mod H ) means that a = a 1 + x and 
b = b' +y for some x,y e H. Therefore, a+ b = (a'+x)+(b'+y) = (a'+b')+(x+y), 
and since x + y e H, this means that a + b = a' + b’ (mod H). □ 

Let G be an abelian group and H a subgroup. Let G/H denote the set of all 
cosets of H in G. Theorem 6.16 allows us to define a binary operation on G/H in 
the following natural way: for a, b e G, define 


[o\h + [b\H '■= [a + b\ h- 
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That this definition is unambiguous follows immediately from Theorem 6.16: if 
[a] H = [a'] H and [b] H = [ b'] H , then [a + b] H = [ a ' + b'] H . 

We can easily verify that this operation makes G/H into an abelian group. We 
need to check that the four properties of Definition 6.1 are satisfied: 

(i) Associativity: 

Mir + (Mrr + Mrr) = Mrr + [b + c]h = [a + (b + c)] H 
= [(a + b) + c\ H = \a + b] H + [c] H 
= (M h + Mrr) + [c]h- 

Here, we have used the definition of addition of cosets, and the correspond- 
ing associativity property for G. 

(ii) Identity element: the coset [Oelrr = H acts as the identity element, since 

Mrr + [(Mir = [a + 0 g]h = Mrr = [0 g + o\h = [(Mir + Mrr- 

(iii) Inverses: the inverse of the coset Mir is [—a]/-/, since 

Mrr + [~o\h = [a + (-«)]rr = [(Mir = [(-a) + a]H = [~o\h + Mir- 

(iv) Commutativity: 

Mir + Mir = [a + b\n = [b + o\h = Mir + Mrr- 

The group G/H is called the quotient group of G modulo H. The order of 
the group G/H is sometimes denoted [ G : // ] and is called the index of H in 
G. Note that if H = G, then the quotient group G/H is the trivial group, and so 
[G : H] = 1. 

Multiplicative notation: if G is written multiplicatively, then the definition of the 
group operation of G/H is expressed \a\u ■ \b\u := \a ■ b\u: the identity element 
of G/H is \\g\h = H, and the inverse of Mrr is [a _1 ]ir- 

Theorem 6.17. Suppose G is a finite abelian group and H is a subgroup of G. 
Then [G : H] = \G\/\H\. Moreover, if K is a subgroup of H, then 

[G:K] = [G: H][H : K]. 

Proof. The fact that \G : H\ = | G' | / 1 // 1 follows directly from Theorem 6.14. The 
fact that [G : K] = [G : H] [H : K] follows from a simple calculation: 

M.J «!d£L.I£i£I. □ 

|H| \H\/\K\ [H : K] 

Example 6.31. For each n > 1, the group Z„ is precisely the quotient group 
Z/nZ. □ 
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Example 6.32. Continuing with Example 6.30, let G := Z 6 and H := 3 G = 
{[0], [3]}. The quotient group G/H has order 3, and consists of the cosets 

«:= {[0], [3]}, p:= {[1], [4]}, y := {[2], [5]}. 

If we write out an addition table for G, grouping together elements in cosets of H 
in G, then we also get an addition table for the quotient group G/H: 


+ 

[ 0 ] 

[3] 

[ 1 ] 

[4] 

[ 2 ] 

[5] 

[ 0 ] 

[ 0 ] 

[3] 

[ 1 ] 

[4] 

[ 2 ] 

[5] 

[3] 

[3] 

[ 0 ] 

[4] 

[ 1 ] 

[5] 

[ 2 ] 

[ 1 ] 

[ 1 ] 

[4] 

[ 2 ] 

[5] 

[3] 

[ 0 ] 

[4] 

[4] 

[ 1 ] 

[5] 

[ 2 ] 

[ 0 ] 

[3] 

[ 2 ] 

[ 2 ] 

[5] 

[3] 

[ 0 ] 

[4] 

[ 1 ] 

[5] 

[5] 

[ 2 ] 

[ 0 ] 

[3] 

[ 1 ] 

[4] 


This table illustrates quite graphically the point of Theorem 6.16: for every two 
cosets, if we take any element from the first and add it to any element of the second, 
we always end up in the same coset. 

We can also write down just the addition table for G/H: 


+ 

a 

P 

Y 

a 

a 

P 

Y 

p 

p 

Y 

a 

y 

Y 

a 

P 


Note that by replacing a with [0]3, /? with [1]3, and y with [2] 3 , the addition table 
for G/H becomes the addition table for Z 3 . In this sense, we can view G/H as 
essentially just a “renaming” of Z 3 . □ 

Example 6.33. Let us return to Example 6.26. The multiplicative group Z* 5 , as we 
saw, is of order 8 . The subgroup (Z * 5 ) 2 of Z * 5 has order 2. Therefore, the quotient 
group Z* 5 /(Zj 5 ) 2 has order 4. Indeed, the cosets are 

«00 := (^ 5 ) 2 = {[1], [4]}, aoi := [-1](Z * 5 ) 2 = {[-1], [-4]}, 

aio := [2](Z * 5 ) 2 = {[2], [- 7 ]}, a u := [-2](Z ^ 5 ) 2 = {[-2], [ 7 ]}. 

We can write down the multiplication table for the quotient group: 



«oo 

aoi 

aio 

a 11 

a 00 

«oo 

aoi 

aio 

a 11 

aoi 

aoi 

aoo 

an 

aio 

®10 

®10 

a 11 

«oo 

aoi 

a 11 

a 11 

aio 

aoi 

«oo 


Note that this group is essentially just a “renaming” of the additive group Z 2 XZ 2 . □ 
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Example 6.34. As we saw in Example 6.27, (Z!) 2 = {[±1]}. Therefore, the 
quotient group Z*/(Z*) 2 has order 2. The cosets of (Z*) 2 in Zj are ao := {[±1]} 
and eq := {[±2]}, and the multiplication table looks like this: 



ao 

ai 

ao 

ao 

ai 

at 

at 

ao 


We see that the quotient group is essentially just a “renaming” of Z 2 . □ 

Exercise 6.15. Write down the cosets of (Z^) 2 in Zj 5 , along with the multipli- 
cation table for the quotient group Z^/fZ^) 2 . 

Exercise 6.16. Let n be an odd, positive integer whose factorization into primes 
is n = p\' ■ ■ ■ p e / . Show that [Z* : (Z*) 2 ] = 2 r . 

Exercise 6.17. Let n be a positive integer, and let m be any integer. Show that 
[Z„ : m7L n \ — n/ gcd (m,n). 

Exercise 6. 18. Let G be an abelian group and H a subgroup with [G : H] = 2. 
Show that if a, b e G \ H. then a + b e H. 

Exercise 6.19. Let H be a subgroup of an abelian group G, and let a.beG 
with a = b (mod H). Show that ka = kb (mod H) for all k e Z. 

Exercise 6.20. Let G be an abelian group, and let ~ be an equivalence relation 
on G. Further, suppose that for all a, a 1 , b e G. if a ~ a 1 , then a + b ~ a 1 + h. Let 
H := [a e G : a ~ 0g } . Show that H is a subgroup of G, and that for all a, b e G, 
we have a ~ b if and only if a = b (mod H). 

Exercise 6.21. Let H be a subgroup of an abelian group G, and let a.b e G. 
Show that [1 a + b]n = {x + y : x e [ a\n , y € [b]n }■ 


6.4 Group homomorphisms and isomorphisms 

In this section, we study maps that relate the structure of one group to another. Such 
maps arc often very useful, as they may allow us to transfer hard-won knowledge 
about one group to another, perhaps more mysterious, group. 

Definition 6.18. A group homomorphism is a function p from an abelian group 
G to an abelian group G’ such that p{a + b) = p(a) + p{b) for all a,beG. 

Note that in the equality p{a + b) = p(a) + p{b ) in the above definition, the 
addition on the left-hand side is taking place in the group G while the addition on 
the right-hand side is taking place in the group G' . 
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Two sets play a critical role in the study of a group homomorphism p : G G' . 
The first set is the image of p, that is, the set p{G) = {p(a) : a e G] . The second 
set is the kernel of p, defined as the set of all elements of G that arc mapped to 
0 a 1 by p, that is, the set p -1 ({0g'}) = [a e G : p{a) = Og'}- We introduce the 
following notation for these sets: Imp denotes the image of p, and Kerp denotes 
the kernel of p. 

Example 6.35. If II is a subgroup of an abelian group G. then the inclusion map 
i : H -> G is obviously a group homomorphism. □ 

Example 6.36. Suppose II is a subgroup of an abelian group G. We define the 
map 

p: G^G/H 
a [a] H . 

It is not hai'd to see that this is a group homomorphism. Indeed, this follows almost 
immediately from the way we defined addition in the quotient group G/H: 

p{a + b) = [a + b\ H = [o\h + [b] H = p(a) + p{b). 

It is clear that p is surjective. It is also not hard to see that Ker p = H; indeed, II 
is the identity element in G/H, and [a]n = H if and only if a e II. The map p is 
called the natural map from G to G/H . □ 

Example 6.37. For a given positive integer n, the natural map from Z to Z„ sends 
a e Z to the residue class \a\ n . This map is a surjective group homomorphism with 
kernel nZ. □ 

Example 6.38. Suppose G is an abelian group and m is an integer. The map 

p : G G 
a ma 

is a group homomorphism, since 

p{a + b) = m(a + b) = ma + mb = p(a ) + p(b). 

The image of this homomorphism is the subgroup mG and the kernel is the sub- 
group G { m } . We call this map the m-multiplication map on G. If G is written 
multiplicatively, then this map, which sends a e G to a m e G, is called the m- 
power map on G, and its image is G'". □ 

Example 6.39. Let p be an odd prime. Consider the 2-power, or squaring, map on 
Z*. Then as we saw in Example 6.21, the image (Z*) 2 of this map is a subgroup 
of Z* of order (p - l)/2, and its kernel is Z* {2 } = { [± 1] } . □ 
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Example 6.40. Consider the /^-multiplication map on Z. As we saw in Exam- 
ple 6.22, its image mZ is equal to Z if and only if m = ± 1 , while its kernel Z {m} 
is equal to Z if m = 0, and is equal to {0} otherwise. □ 

Example 6.41. Consider the ^-multiplication map on Z„. As we saw in Exam- 
ple 6.23, if d := gcd (m. n ), the image mZ„ of this map is a subgroup of Z„ of order 
n/d, while its kernel Z„ { m } is a subgroup of order d. □ 

Example 6.42. Suppose G is an abelian group and a is an element of G. It is easy 
to see that the map 

p: 7L^G 
Z > za 

is a group homomorphism, since 

p(z + z!) = (z + z')a = za+ z'a = p(z) + p(z'). □ 

Example 6.43. As a special case of the previous example, let n be a positive integer 
and let a be an element of Z*. Let p : Z -> Z* be the group homomorphism that 
sends z e Z to a z e Z*. That p is a group homomorphism means that a z+z ' = a z a z 
for all z, z! £ Z (note that the group operation is addition in Z and multiplication 
in Z*). If the multiplicative order of a is equal to k, then as discussed in §2.7, the 
image of p consists of the k distinct group elements a 0 , a 1 , . . . , a k ~ l . The kernel 
of p consists of those integers z such that a z = 1. Again by the discussion in §2.7, 
the kernel of p is equal to the subgroup /cZ. □ 

Example 6.44. Generalizing Example 6.42, the reader may verify that if a\, . . . , ag 
are fixed elements of an abelian group G, then the map 

p: Z xk -> G 

(zi, . . . , Zk) ^ Z\a\ + • • • + Zka k 
is a group homomorphism. □ 

Example 6.45. Suppose that H k are subgroups of an abelian group G. The 

reader may easily verify that the map 

p : H\ x ■ ■ ■ x H k — » G 

{a\ a k )\-> ai + -- - + a k 

is a group homomorphism whose image is the subgroup H\ + ■ ■ ■ + H k . □ 

The following theorem su mm arizes some of the most important properties of 
group homomorphisms. 
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Theorem 6.19. Let p be a group homomorphism from G to G'. Then: 

(i) p(0 G ) = Oc' ; 

(ii) p(—a ) = -p(fl) for all a e G; 

(iii) p{na) = np(a) for all n e Z and a e G; 

(iv) if H is a subgroup of G, then p(H) is a subgroup of G' ; in particular 
(setting H := G), Im p is a subgroup of G' ; 

(v) if IT is a subgroup of G' , then p~ l (H') is a subgroup of G; in particular 
(setting IT := { ()<-,' } ), Ker p is a subgroup of G; 

(vi) for all a,b e G, pi a) = p(h) if and only if a = b (mod Ker p); 

(vii) p is injective if and only if Ker p = {Og} ■ 

Proof. These arc all straightforward calculations. 

(i) We have 

Og' + p(0g) = p(0g) = P(0g + Og) = p( Og) + p(0 G ). 

Now cancel p (()(;) from both sides. 

(ii) We have 

Og' = P(0 G ) = p(a + (- a )) = p(a) + p(-a), 
and hence p(—a) is the inverse of pi a). 

(iii) For n = 0. this follows from paid (i). For n > 0, this follows from the 
definitions by induction on n. For n < 0, this follows from the positive case 
and part (ii). 

(iv) For all a,beH, we have a + b e H and —aeH\ hence, p(H) contains 
p{a + b) = p(a) + p(b) and p(-a) = -p(a). 

(v) p~ x (H') is non-empty, since p{ Og) = 0' G e H'. If p(a) e TT and 
p{b) e H\ then p(a + b) = p(a) + p(b ) 6 FT, and p(-a) = -p(a) e H'. 

(vi) We have 

p(a) = p(b) p(a ) - p(b) = Og' 4=> p(a - b) = 0 G ' 

a — b e Ker p <=> a = b (mod Ker p). 

(vii) If p is injective, then in particular, 1 ( { Og' } ) cannot contain any other ele- 
ment besides Og- If P is not injective, then there exist two distinct elements 
a.beG with p(a) = p(b), and by paid (vi), Kerp contains the element 
a — b, which is non-zero. □ 

Part (vii) of the above theorem is particularly useful: to check that a group 
homomorphism is injective, it suffices to determine if Ker p = {Og}- Thus, the 
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injectivity and surjectivity of a given group homomorphism p : G — ► G' may be 
characterized in terms of its kernel and image: 

• p is injective if and only if its kernel is trivial (i.e. Ker p = { Of, } ) ; 

• p is surjective if and only if Im p = G' . 

We next present two very easy theorems that allow us to compose group homo- 
morphisms in simple ways. 

Theorem 6.20. If p : G -» G' and p' : G' — »• G" are group homomorphisms, then 
so is their composition p’ o p : G -> G" . 

Proof. For all a,b e G, we have 

P'(p(a + b)) = p'(p(a) + p(b)) = p\p(a)) + pfp(b)). □ 

Theorem 6.21. Let pi : G — > G\, for i = 1 be group homomorphisms. 

Then the map 

p: G -»• G\ x • • • x G' k 
a (pfa),...,p k (a)) 

is a group homomorphism. 

Proof. For all a. b e G. we have 

P(a + b) = {p fa + b) p k (a + b)) = ip fa) + pfb), ..., p k (a ) + p k (b)) 

= pia) + p(b). □ 

Consider a group homomorphism p : G -> G' . If p is bijective, then p is called 
a group isomorphism of G with G' . If such a group isomorphism p exists, we say 
that G is isomorphic to G' . and write G = G' . Moreover, if G — G' . then p is 
called a group automorphism on G. 

Theorem 6.22. If p is a group isomorphism of G with G' . then the inverse func- 
tion p~ l is a group isomorphism of G' with G. 

Proof. For all a' ,b' e G' . we have 

p(p~ l {a') + p~\b')) = p(p~\a')) + p(p~ l (b')) = a' + b', 

and hence p~ l {a') + p~ l (b') = p~ l {a! + b'). □ 

Because of this theorem, if G is isomorphic to G' , we may simply say that “G 
and G' arc isomorphic.” 

We stress that a group isomorphism p : G -> G' is essentially just a “renaming” 
of the group elements. This can be visualized as follows. Imagine the addition 
table for G written out with rows and columns labeled by elements of G. with the 
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entry in row a and column b being a + b. Now suppose we use the function p 
to consistently rename all the elements of G appealing in this table: the label on 
row a is replaced by p{a), the label on column b by p{b), and the entry in row a 
and column b by p(a + b). Because p is bijective, every element of G' appeal's 
exactly once as a label on a row and as a label on a column; moreover, because 
p{a + b) = p{a) + p{b), what we end up with is an addition table for G' . It follows 
that all structural properties of the group are preserved, even though the two groups 
might look quite different syntactically. 

Example 6.46. As was shown in Example 6.32, the quotient group G/H discussed 
in that example is isomorphic to Z 3 . As was shown in Example 6.33, the quotient 
group Z* 5 /(Z * 5 ) 2 is isomorphic to Z 2 x Z 2 . As was shown in Example 6.34, the 
quotient group Z*/(Z *) 2 is isomorphic to Z 2 . □ 

Example 6.47. If gcd (m,n) = 1, then the m-multiplication map on Z„ is a group 
automorphism. □ 


The next theorem tells us that corresponding to any group homomorphism, there 
is a natural group isomomorphism. As group isomorphisms are much nicer than 
group homomorphisms, this is often very useful. 


Theorem 6.23 (First isomorphism theorem). Let p : G — »• G' be a group homo- 
morphism with kernel K and image If 1 . Then we have a group isomorphism 

G/K = H'. 


Specifically, the map 

p : G/K -> G' 

[a] K p(a) 

is an injective group homomorphism whose image is II' . 


Proof. Using part (vi) of Theorem 6.19, we see that for all a,beG, we have 

[o\k = \b\K 4=> a = b (mod K) p(a) = p{b). 

This immediately implies that the definition of p is unambiguous (\o\k = [P\k 
implies p(a) = p{b)), and that p is injective (p(a) = p(b ) implies [a]K = [6 ]k). 
It is clear that p maps onto IP, since every element of IP is of the form p(a) for 
some a e G, and the map p sends [o\k to pi a). Finally, to see that p is a group 
homomorphism, note that 

p{[a]K + \h\K) = p([a + b] K ) = p{a + b) = p(a) + p(b) = p{[a] K ) + p([b] K )- □ 
We can generalize the previous theorem, as follows: 
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Theorem 6.24. Let p : G — »• G' be a group homomorphism. Then for every 
subgroup H of G with H C Ker p, we may define a group homomorphism 

p : G/H G f 
[a] H i ^ p{a). 

Moreover, Im p = Im p, and p is injective if and only if H = Ker p. 

Proof. Using the assumption that H C Ker p, we see that p is unambiguously 
defined, since for all a,b e G, we have 

Mir = Wh => a = b (mod H ) => a = b (mod Ker p) ==> p{a) = p(b). 

That p is a group homomorphism, with Im p = Im p, follows as in the proof of The- 
orem 6.23. If H = Ker p, then by Theorem 6.23, p is injective, and if H C Ker p, 
then p is not injective, since if we choose a e Ker p \ H , we see that /?([«]#) = Ogs 
and hence Ker p is non-trivial. □ 

The next theorem gives us another important construction of a group isomor- 
phism. 

Theorem 6.25 (Internal direct product). Let G be an abelian group with sub- 
groups H\, H 2 , where If \ n IT = {Og} ■ Then we have a group isomorphism 

Hi x Hi = Hi + H 2 

given by the map 

p: Hi x H 2 -> Hi + H 2 

( ai,a 2 ) i-*- a\ + a 2 . 

Proof. We already saw that p is a surjective group homomorphism in Example 6.45. 
To see that p is injective, it suffices to show that Ker p is trivial; that is, it suffices 
to show that for all a 1 e H\ and a 2 e H 2 , if ai + a 2 = 0 q, then ai = a 2 = 0 g- But 
a\ + a 2 = 0 g implies a\ = —a 2 e H 2 , and hence a\ e H\ fl H 2 = {Og}, and so 
a 1 = Og- Similarly, one shows that a 2 = Og, and that finishes the proof. □ 

If Hi, Hi are as in the above theorem, then ff + Hi is sometimes called the 

internal direct product of Hi and Hi. 

Example 6.48. We can use the general theory developed so far to get a quick- 
and-dirty proof of the Chinese remainder theorem (Theorem 2.6). Let {«/}f =1 be a 
pairwise relatively prime family of positive integers, and let n := I7f =1 n i- Consider 
the map 

p . Ti ^ x • • • x 

a i-> ([«]„!,..., [a]„ k ). 
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It is easy to see that this map is a group homomorphism; indeed, it is the map 
constructed in Theorem 6.21 applied with the natural maps p, : Z — »• Z„ ( , for 
i = 1 ,k. Evidently, a e Ker p if and only if n, | a for = 1, . . . , k, and since 
{«/ } is pairwise relatively prime, it follows that a e Ker p if and only if n \ a\ 
that is, Ker p = nZ. Theorem 6.23 then gives us an injective group homomorphism 

p . Z n y Z rtl x • • • x Z nk 
M» i y ([a] Ml ,...,[a]„ t ). 

But since the sets Z„ and Z„ x • • • x h„ k have the same size, injectivity implies 
surjectivity. From this. Theorem 2.6 is immediate. 

The map p is a group isomorphism 

Z n — Z/ji X • • • X Z„ t . 

In fact, the map p is the same as the map 6 in Theorem 2.8, and so we also imme- 
diately obtain parts (i), (ii), (iii.a), and (iii.b) of that theorem. 

Observe that parts (iii.c) and (iii.d) of Theorem 2.8 imply that restricting the 
map 9 to Z* yields an isomorphism of multiplicative groups 

z: = z;x.-xz;. 

This fact does not follow from the general theory developed so far; however, in the 
next chapter, we will see how this fact fits into the broader algebraic picture. 

One advantage of our original proof of Theorem 2.6 is that it gives us an explicit 
formula for the inverse map 9~ l , which is useful in computations. □ 

Example 6.49. Let n \ , m_ be positive integers with n\ \ n^. Consider the natural 
map p : Z -» Z ni . This is a surjective group homomorphism with Ker p = «| Z. 
Since H := n-±L C n \ Z, we may apply Theorem 6.24 with the subgroup H, 
obtaining the surjective group homomorphism 

p . Z„ 2 y Z ni 

[a]„ 2 i y [a] ni . □ 

Example 6.50. Let us revisit Example 6.23. Let n be a positive integer, and let m 
be any integer. Let p\ : Z -* Z„ be the natural map, and let p 2 : Z„ -> Z„ be 
the Mi-multiplication map. The composed map p '■= p 2 ° P\ from Z to Z„ is also 
a group homomorphism. For each z £ Z, we have p( z) = m[z\ n = [ mz] n . The 
kernel of p consists of those integers z such that mz = 0 (mod n ), and so paid (ii) 
of Theorem 2.5 implies that Ker p = {n/d) Z, where d := gcd(m, n). The image of 
p is mZ n . Theorem 6.23 therefore implies that the map 

P ■ '^‘n/d ~^y mL n 

l Z]n/d m[zin 
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is a group isomorphism. □ 

Example 6.51. Consider the group Z* where p is an odd prime, and let p : Z* -» Z* 
be the squaring map. By definition, Im p = (Z*) 2 , and we proved in Theorem 2.18 
that Kerp = {[±1]}. Theorem 2.19 says that for all y,fi e Z*, y 2 = ft 2 if 
and only if y = ±fi. This fact can also be seen to be a special case of paid 
(vi) of Theorem 6.19. Theorem 6.23 says that Z*/Ker p = Imp, and since 
|Z*/Kerp| = | Z* | / 1 Ker p | = (p — l)/2, we see that Theorem 2.20, which says 
that |(Z*) 2 | = (p - l)/2, follows from this. 

Let H := (Z*) 2 , and consider the quotient group Z */H. Since \H\ = (p— l)/2, 
we know that |Z*/T7| = \L*\/\H\ = 2, and hence Z */H consists of the two cosets 
H and H :=Z*\ H. 

Let a be an arbitrary, fixed element of //, and consider the map 

t : Z -> Z */H 

Z ^ [a z ] H - 

It is easy to see that r is a group homomorphism; indeed, it is the composition 
of the homomorphism discussed in Example 6.43 and the natural map from Z* to 
Z p/H. Moreover, it is easy to see (for example, as a special case of Theorem 2.17) 
that 

a z e H z is even. 

From this, it follows that Ker r = 2Z; also, since Z */H consists of just the two 
cosets H and H, it follows that r is surjective. Therefore, Theorem 6.23 says that 
the map 

f : Z 2 -> Z */H 

[ zh >-> [u z ]h 

is a group isomorphism, under which [0] 2 corresponds to //, and [1] 2 corresponds 
to 77. 

This isomorphism gives another way to derive Theorem 2.23, which says that 
in Z*, the product of two non-squares is a square; indeed, the statement “non-zero 
plus non-zero equals zero in Z 2 ” translates via the isomorphism f to the statement 
“non-square times non-square equals square in Z*.” □ 

Example 6.52. Let O* be the multiplicative group of non-zero rational numbers. 
Let H\ be the subgroup {±1}, and let Hi be the subgroup of positive rationals. It 
is easy to see that Q* = H\ ■ Hi and that H\ n Hi = { 1 } . Thus, Q* is the internal 
direct product of H \ and Hi, and Theorem 6.25 gives us a group isomorphism 
Q* “ Hi x Hi. □ 



6.4 Group homomorphisms and isomorphisms 


151 


Let G and G' be abelian groups. Recall from Example 6.19 that Map(G, G') 
is the group of all functions a : G -» G', where the group operation is defined 
point- wise using the group operation of G'\ 

(a + T)(a) = er(n) -(- r (a) and (-a){a) = -a{a) 

for all a, r e Map(G, G') and all a e G. The following theorem isolates an impor- 
tant subgroup of this group. 

Theorem 6.26. Let G and G' be abelian groups, and consider the group of func- 
tions Map(G, G'). Then 

Hom(G, G') := { & e Map(G, G') : u is a group homomorphism } 
is a subgroup of Map(G\ G'). 

Proof. First, observe that Hom(G, G') is non-empty, as it contains the map that 
sends everything in G to Og' (this is the identity element of Map(G, G’)). 

Next, we have to show that if r> and r arc homomorphisms from G to G' , then 
so are <j + r and —a. But a + r = p 2 ° Pi, where p\ : G -» G' x G' is the map 
constructed in Theorem 6.21, applied with a and r, and P 2 : G' x G' G' is as in 
Example 6.45. Also, —a = p - 1 ° cr, where p-\ is the (- ^-multiplication map. □ 

Exercise 6.22 . Verify that the “is isomorphic to” relation on abelian groups is 
an equivalence relation; that is, for all abelian groups G\, Gj, G 3 , we have: 

(a) G x =Gp, 

(b) G\ = G 2 implies G 2 = Gp, 

(c) G\ — G 2 and G 2 — G 2 implies G\ — G 2 . 

Exercise 6.23 . Let p, : G, — > G\, for / = 1, . . . , k, be group homomorphisms. 
Show that the map 

p: Gi x • • • x G k -> G\ x • • • x G' k 

(«i a k ) ( p\(a \ ), . . . , Pk{a k )) 

is a group homomorphism. Also show that if each p, is an isomorphism, then so is 
P- 

Exercise 6.24 . Let p : G -» G' be a group homomorphism. Let //, K be sub- 
groups of G and let m be a positive integer. Show that p{H + K) = p(H) + p(K ) 
and p(mH) = rnp(H). 

Exercise 6.25 . Let p : G -> G' be a group homomorphism. Let H be a subgroup 
of G, and let t : H —* G' be the restriction of p to H. Show that r is a group 
homomorphism and that Ker r = Ker p n H. 
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Exercise 6.26 . Suppose G\,...,Gk arc abelian groups. Show that for each 
i = 1 , . . . , k, the projection map /r, : Gi x • • • x Gk -> Gj that sends (a\, . . . , cik) to 
a, is a surjective group homomorphism. 

Exercise 6.27 . Show that if G — G\ x G 2 for abelian groups G\ and G 2 , and II \ 
is a subgroup of G\ and // 2 is a subgroup of G 2 . then we have a group isomorphism 
G/(H { x H 2 ) “ GJH { x G 2 /H 2 . 

Exercise 6.28 . Let G be an abelian group with subgroups H and K. 

(a) Show that we have a group isomorphism ( II + K)/K = ///( II n K). 

(b) Show that if H and K are finite, then \H + K | = \H\\K\/\H n K\. 

Exercise 6.29. Let G be an abelian group with subgroups //, K, and A, where 
K C H. Show that ( II n A)/(K n A) is isomorphic to a subgroup of H/K. 

Exercise 6.30. Let p : G -> G' be a group homomorphism with kernel K. Let 
H be a subgroup of G. Show that we have a group isomorphism G/(H + K) = 
p(G)/p(H). 

Exercise 6.31. Let p : G -> G' be a surjective group homomorphism. Let S be 
the set of all subgroups of G that contain Ker p, and let S' be the set of all subgroups 
of G' . Show that the sets S and S' are in one-to-one correspondence, via the map 
that sends H e S to p( II) e S'. Also show that this correspondence preserves 
inclusions; that is, for all Hi,H 2 6 S, we have LZj C H 2 <=> p(H\) C p(H 2 ). 

Exercise 6.32. Use the previous exercise, together with Theorem 6.9, to get a 
short proof of Theorem 6.10. 

Exercise 6.33. Show that the homomorphism of Example 6.44 arises by direct 
application of Example 6.42, combined with Theorems 6.20 and 6.21. 

Exercise 6.34. Suppose that G, G\ , and G4 arc abelian groups, and that p : 
G\ x G3 -» G is a group isomorphism. Let II \ := p(G\ x {0g 2 }) and II 2 := 
M{0g, } x G3). Show that G is the internal direct product of II \ and Ih. 

Exercise 6.35. Let Z + denote the set of positive integers, and let Q* be the 
multiplicative group of non-zero rational numbers. Consider the abelian groups 
Map # (Z + ,Z) and Map # (Z + ,Z 2 ), as defined in Exercise 6.14. Show that we have 
group isomorphisms 

(a) Q* = Z 2 x Map # (Z + ,Z), and 

(b) Q*/(Q*) 2 = Map # (Z + ,Z 2 ). 

Exercise 6.36. Let n be an odd, positive integer whose factorization into primes 
is n = P|' ■ ■ ■ p e r r ■ Show that: 
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(a) we have a group isomorphism Z*/(Z *) 2 = Z* r ; 

(b) if pi = 3 (mod 4) for each i = 1, . . . , r, then the squaring map on (Z *) 2 is a 
group automorphism. 


Exercise 6.37. Which of the following pairs of groups are isomorphic? Why or 
why not? (a) Z 2 x Z 2 and Z 4 , (b) ZJ 2 and Zg, (c) Z* and Z 4 , (d) Z 2 x Z and Z, (e) 
Q and Z, (f) Z x Z and Z. 


6.5 Cyclic groups 

Let G be an abelian group. For a e G, define (a) := {za : z £ Z}. It is easy 
to see that (a) is a subgroup of G; indeed, it is the image of the group homo- 
morphism discussed in Example 6.42. Moreover, (a) is the smallest subgroup of 
G containing a: that is, (a) contains a, and every subgroup of G that contains a 
must contain everything in (a). Indeed, if a subgroup contains a, it must contain 
a + a = 2a, a + a + a = 3a, and so on; it must also contain Og = On, —a = (-l)a, 
(—a) + (—a) = (—2 )a, and so on. The subgroup (a) is called the subgroup (of G) 
generated by a. Also, one defines the order of a to be the order of the subgroup 
(a). 

More generally, for a \, . . . , a k e G, we define 

(ai,...,a k ) ■= {z\a\ + ■ ■ ■ + z k a k ■ Zu„..,Zk eZ}. 

It is easy to see that (a\, . . . , a k ) is a subgroup of G: indeed, it is the image of 
the group homomorphism discussed in Example 6.44. Moreover, this subgroup is 
the smallest subgroup of G that contains a\, . . . , a k \ that is, («i, . . . , a k ) contains 
the elements a\,...,a k , and every subgroup of G that contains these elements 
must contain everything in {a\, . . . ,a k ). The subgroup (a\, . . . , a k ) is called the 
subgroup (of G) generated by a\, . . . , a k . 

An abelian group G is called cyclic if G = (a) for some a e G, in which case, 
a is called a generator for G. An abelian group G is called finitely generated if 
G = (oi, . . . , a k ) for some a \, . . . , a k e G. 

Multiplicative notation: if G is written multiplicatively, then (a) := [a z : z€ Z}, 
and («i a k ) := {n ^ 1 • • • a£ : z\,-.-,z k 6 Z); also, for emphasis and clarity, 

we use the term multiplicative order of a. 

Example 6.53. Consider the additive group Z. This is a cyclic group, with 1 being 
a generator: 

(1) = {z ■ 1 : z e Z} = {z : z e Z} = Z. 
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For every m e Z, we have 

( m ) = {zm : z 6 Z} = {mz '■ z £ Z} = mZ. 

It follows that the only elements of Z that generate Z arc 1 and - 1 : every other 
element generates a subgroup that is strictly contained in Z. □ 

Example 6.54. For n > 0, consider the additive group Z„. This is a cyclic group, 
with [1] being a generator: 

([1]) = {*[1] : zeZJ = {[zl :zeZj = Z„. 

For every me Z, we have 

([m]) = {z[m\ : z £ Z} = {[zm] : z £ Z} = [m[z\ : z £ Z} = mZ„. 

By Example 6.23, the subgroup mZ„ has order n/ gcd Thus, \m \ has order 
n/ gcd(m, «); in particular, \m \ generates Z„ if and only if m is relatively prime to 
n, and hence, the number of generators of Z„ is (pin). □ 

Implicit in Examples 6.53 and 6.54 is the following general fact: 

Theorem 6.27. Let G be a cyclic group generated by a. Then for every m e Z, 
we have 

(j ma ) = mG. 


Proof. We have 

(ma) = { z(ma ) : z £ Z} = (m(za) : z £ Z} = m(a) = mG. □ 

The following two examples present some groups that arc not cyclic. 

Example 6.55. Consider the additive group G := Z x Z. Set 

«i := (1.0) e G and a 2 := (0, 1) e G. 

It is not hai'd to see that G = (a\,af), since for all zi, Z 2 £ Z, we have 

Ztai + Z2«2 = (Zl.O) + (0.Z2) = (Zl,Z2). 

However, G is not cyclic. To see this, let f = (b\, bj) be an arbitrary element of G. 
We claim that one of a\ or ai does not belong to (jl). Suppose to the contrary that 
both a\ and a 2 belong to (p). This would imply that there exist integers z and z! 
such that 


zb i = 1, 

z'bi = 0 , 


zb 2 = 0, 
z'b 2 = 1. 
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Multiplying the upper left equality by the lower right, and the upper right by the 
lower left, we obtain 

1 = zz!b\b 2 = 0, 

which is impossible. □ 

Example 6.56. Consider the additive group G := Z Bl x Z B ,. Set 

a i ■= ([!]«!, [0]„ 2 ) e G and a 2 := ([0] Bl , [1]„ 2 ) 6 G. 

It is not hai'd to see that G = (a\, a 2 ), since for all zi,Z 2 € Z, we have 

Ziai + z 2 a 2 = ([zi] ni , [0]„ 2 ) + ([0] Bl , [z 2 ] n2 ) = (Ui]« P [z 2 ]„ 2 ). 

However, G may or may not be cyclic: it depends on d := gcd(«i, n 2 ). 

If d = 1, then G is cyclic, with a := ([l] Bl , [1]„ 2 ) being a generator. One can 
see this easily using the Chinese remainder theorem: for all zi, z 2 e Z, there exists 
z 6 Z such that 

Z = Zi (mod n i) and z = z 2 (mod n 2 ), 

which implies 

za = ([z] B1 , [z]n 2 ) = ([Zlini,[Z2in 2 )- 

If d > 1, then G is not cyclic. To see this, let /? = ([Z>i] Bl , \b 2 \ ni ) be an arbitrary 
element of G. We claim that one of ai or a 2 does not belong to (/?). Suppose to 
the contrary that both a\ and a 2 belong to ((]). This would imply that there exist 
integers z and z! such that 

zb\ = 1 (mod n\), zb 2 = 0 (mod n 2 ), 

z'bi = 0 (mod n\), z'b 2 = 1 (mod n 2 ). 

All of these congruences hold modulo d as well, and multiplying the upper left 
congruence by the lower right, and the upper right by the lower left, we obtain 

1 = zz'bib 2 = 0 (mod d ), 

which is impossible. □ 

It should be clear that since a group isomorphism preserves all structural prop- 
erties of groups, it preserves the property of being cyclic. We state this, along with 
related facts, as a theorem. 

Theorem 6.28. Let p : G -> G' be a group isomorphism. 

(i) For all a e G, we have p((a)) = ( p{a )). 
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(ii) For all a e G, a and p(a ) have the same order. 

(iii) G is cyclic if and only if G' is cyclic. 

Proof. For all a e G, we have 

P((a)) = {p(za) : ze Z} = {zp(a) : z e Zj = ( p(a )>. 

That proves (i). 

(ii) follows from (i) and the fact that p is injective. 

(iii) follows from (i), as follows. If G is cyclic, then G = (a), and since p is 
surjective, we have G' = p{G) = (p(a)). The converse follows by applying the 
same argument to the inverse isomorphism p~ l : G' -» G. □ 

Example 6.57. Consider again the additive group G := Z Bl x Z„ 2 , discussed in 
Example 6.56. If gcd(«i, nj) = 1, then one can also see that G is cyclic as follows: 
by the discussion in Example 6.48, we know that G is isomorphic to Z ni „ 2 , and 
since Z„ lH2 is cyclic, so is G. □ 

Example 6.58. Consider again the subgroup mZ n of Z„, discussed in Exam- 
ple 6.54. One can also see that this is cyclic of order n/d, where d := gcd {m, n), as 
follows: in Example 6.50, we constructed an isomorphism between Z,,/^ and mZ n , 
and this implies mZ n is cyclic of order n/d. □ 

Classification of cyclic groups. Examples 6.53 and 6.54 are extremely important 
examples of cyclic groups. Indeed, as we shall now demonstrate, every cyclic 
group is isomorphic either to Z or to Z„ for some n > 0. 

Suppose that G is a cyclic group with generator a. Consider the map p : Z — >• G 
that sends ^ e Z to za e G. As discussed in Example 6.42, this map is a group 
homomorphism, and since a is a generator for G. it must be surjective. There are 
two cases to consider. 

Case 1: Ker p = {0} . In this case, p is an isomorphism of Z with G. 

Case 2: Ker p f {0} . In this case, since Ker p is a subgroup of Z different from 
{0} , by Theorem 6.9, it must be of the form nZ for some n > 0. Hence, by 
Theorem 6.23, the map p : Z„ — >• G that sends [z]„ to za is an isomorphism 
of Z„ with G. 

Based on this isomorphism, we immediately obtain: 

Theorem 6.29. Let G be an abelian group and let a e G. If there exists a positive 
integer m such that ma = ()<-, . then the least such positive integer n is the order of 
a; in this case, we have: 

• for every integer z, za = Of, if and only if n divides z. and more generally, 
for all integers zu Zi, we have z\a = zia if and only if z\ = Z .2 (mod n): 
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• the subgroup (a) consists of the n distinct elements 

0 • a, 1 • a , — 1) ■ a. 

Otherwise, a has infinite order, and every element of (a) can be expressed as za 
for some unique integer z- 

In the case where the group is finite, we can say more: 

Theorem 6.30. Let G be a finite abelian group and let a e G. Then \G\a = 0 fj 
and the order of a divides |G|. 

Proof. Since (a) is a subgroup of G, by Lagrange’s theorem (Theorem 6.15), the 
order of a divides |G|. It then follows by Theorem 6.29 that \G\a = Of,. □ 

Example 6.59. Let a,ne Z with n > 0 and gcd(a, n) = 1, and let a := [a] e Z*. 
Theorem 6.29 implies that the definition given in this section of the multiplicative 
order of a is consistent with that given in §2.7. Moreover, Euler’s theorem (Theo- 
rem 2.13) can be seen as just a special case of Theorem 6.30. Also, note that a is a 
generator for Z* if and only if a is a primitive root modulo p. □ 

Example 6.60. As we saw in Example 6.26, all elements of Z* 5 have multiplicative 
order dividing 4, and since Z* 5 has order 8, we conclude that Z* 5 is not cyclic. □ 

Example 6.61. The group Z( is cyclic, with [2] being a generator: 

[2] 2 = [4] = [-1], [2] 3 = [-2], [2] 4 = [1], □ 

Example 6.62. Based on the calculations in Example 2.9, we may conclude that 
Zy is cyclic, with both [3] and [5] being generators. □ 

Example 6.63. Consider again the additive group G := Z„, x Z„, . discussed in 
Example 6.56. If d := gcd(«j, ni) > 1, then one can also see that G is not cyclic as 
follows: for every f e G, we have («i« 2 /d)/? = Of,-, and hence by Theorem 6.29, 
the order of /I divides n\nj/d. □ 

The following two theorems completely characterize the subgroup structure of 
cyclic groups. Actually, we have already proven most of the results in these two 
theorems, but nevertheless, they deserve special emphasis. 

Theorem 6.31. Let G be a cyclic group of infinite order. 

(i) G is isomorphic to Z. 

(ii) There is a one-to-one correspondence between the non-negative integers 
and the subgroups of G, where each such integer m corresponds to the 
cyclic group mG. 
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(Hi) For every two non-negative integers m. m' , we have mG C m'G if and only 
if m' | m. 

Proof. That G = Z was established in our classification of cyclic groups, and so 
it suffices to prove the other statements of the theorem for G = Z. As we saw in 
Example 6.53, for every integer m, the subgroup mZ is cyclic, as it is generated by 
m. This fact, together with Theorem 6.9, establishes all the other statements. □ 

Theorem 6.32. Let G be a cyclic group of finite order n. 

(i) G is isomorphic to Z„. 

(ii) There is a one-to-one correspondence between the positive divisors of n 
and the subgroups of G, where each such divisor d corresponds to the 
subgroup dG; moreover, dG is a cyclic group of order n/d. 

(iii) For each positive divisor d of n, we have dG = G{n/d) ; that is, the 
kernel of the (n/ d) -multiplication map is equal to the image of the d- 
multiplication map; in particular, G {n/d} has order n/d. 

(iv) For every two positive divisors d, d! of n, we have dG C d'G if and only if 
d' | d. 

(v) For every positive divisor d of n, the number of elements of order d in G 
is cp(d). 

(vi) For every integer m, we have mG = dG and G{m } = G{d], where 
d : = gcd (m,n). 

Proof. That G = Z„ was established in our classification of cyclic groups, and so 
it suffices to prove the other statements of the theorem for G = Z„. 

The one-to-one correspondence in part (ii) was established in Theorem 6.10. By 
the discussion in Example 6.54, it is clear that dZ„ is generated by \d\ and has 
order n / d. 

Part (iii) was established in Example 6.23. 

Part (iv) was established in Theorem 6.10. 

For part (v), the elements of order d in Z„ arc all contained in Z „{d), and so 
the number of such elements is equal to the number of generators of Z n {d). The 
group Z „{d] is cyclic of order d, and so is isomorphic to Z </, and as we saw in 
Example 6.54, this group has cp(d) generators. 

Part (vi) was established in Example 6.23. □ 

Since cyclic groups arc in some sense the simplest kind of abelian group, it is 
nice to establish some sufficient conditions under which a group must be cyclic. 
The following three theorems provide such conditions. 


Theorem 6.33. If G is an abelian group of prime order, then G is cyclic. 
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Proof. Let |G| = p, which, by hypothesis, is prime. Let a e G with a f Og, and 
let k be the order of a. As the order of an element divides the order of the group, 
we have k \ p, and so k = 1 or k = p. Since a f Og, we must have k f 1, and so 
k = p, which implies that a generates G. □ 

Theorem 6.34. If G\ and G 2 are Unite cyclic groups of relatively prime order, then 
G\ x Gi is also cyclic. In particular, if G\ is generated by a\ and GT is generated 
by cio, then G\ x Gi is generated by (a\, ai). 

Proof. We give a direct proof, based on Theorem 6.29. Let n\ := |Gi| and 
« 2 := | G 2 1 , where gcd(ni,«o) = 1- Also, let a\ e G\ have order n\ and «2 £ G 2 
have order ni. We want to show that (a\, ai) has order n\ni. Applying The- 
orem 6.29 to {a\,a 2 ), we see that the order of (a\, an) is the smallest positive 
integer k such that k{a\,a 2 ) = (Og, - 0g 2 )- Now, for every integer k, we have 
k{a\,a 2 ) = {ka\,kci 2 ), and 

{ka\,ka 2 ) = (0g p 0g 2 ) 5=> «i | k and n 2 I k 

(applying Theorem 6.29 to a\ and an) 

<=> n\U 2 | k (since gcd(«i, « 2 ) = !)• LI 

Theorem 6.35. Let G be a cyclic group. Then for every subgroup H of G, both 
H and G/H are cyclic. 

Proof. The fact that H is cyclic follows from paid (ii) of Theorem 6.31 in the case 
where G is infinite, and paid (ii) of Theorem 6.32 in the case where G is finite. If 
G is generated by a, then it is easy to see that G/H is generated by [o\h- □ 

The next three theorems arc often useful in calculating the order of a group 
element. The first generalizes Theorem 2.15. 

Theorem 6.36. Let G be an abelian group, let a e G be of Unite order n, and let 
m be an arbitrary integer. Then the order of ma is n/ gcd (m, n). 

Proof. Let H := (a), and d := gcd (m, n). By Theorem 6.27, we have {ma) = mH, 
and by Theorem 6.32, we have mH = dH, which has order n/d. 

That proves the theorem. Alternatively, we can give a direct proof, based on 
Theorem 6.29. Applying Theorem 6.29 to ma, we see that the order of ma is the 
smallest positive integer k such that k{ma) = Og- Now, for every integer k, we 
have k{ma) = ( km)a , and 

(, km)a = Og km = 0 (mod n ) (applying Theorem 6.29 to a) 

k = 0 (mod n/ gcd {m, n)) (by paid (ii) of Theorem 2.5). □ 
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Theorem 6.37. Suppose that a is an element of an abelian group, and for some 
prime p and integer e > 1, we have p e a = 0 g and p e ~ l a 0 f , . Then a has order 
P e . 

Proof If m is the order of a, then since p e a = 0g, we have m \ p e . So m = pf for 

some / = 0 ,e. If / < e, then p e ~ 1 a = 0g, contradicting the assumption that 

p e ~ l a ■£ 0 G - □ 

Theorem 6.38. Suppose G is an abelian group with a\, oi e G such that a\ is 
of finite order n\, is of finite order nj, and gcd(«i, nf) = 1. Then the order of 
0{ + 02 is »i«2- 

Proof Let Tf\ := (a\ ) and H 2 '■= ( 02 ) so that |iTi| = n\ and \H 2 \ = « 2 - 

First, we claim that ff\ n IT = {0g}- To see this, observe that II] n IT is a 
subgroup ofiTi, and so \ II \C\IT\ divides np, similarly, \ II \ C\IT \ divides « 2 - Since 
gcd(«i, « 2 ) = 1, we must have \H\ n H 2 \ = 1, and that proves the claim. 

Using the claim, we can apply Theorem 6.25, obtaining a group isomorphism 
between II \ + IT and If x If. Under this isomorphism, the group element 
a 1 + 02 6 Hi + IT corresponds to (a \ , aj) e If x If, which by Theorem 6.34 
(again using the fact that gcd(«i, nf) = 1) has order «i« 2 - □ 

For an abelian group G, we say that an integer k kills G if k G = { 0g } • Consider 
the set Kl(, of integers that kill G. Evidently, Klc is a subgroup of Z, and hence of 
the form mZ for a uniquely determined non-negative integer m. This integer m is 
called the exponent of G. If in f 0, then we see that m is the least positive integer 
that kills G. 

The following two theorems state some simple properties of the exponent of a 
group. 

Theorem 6.39. Let G be an abelian group of exponent m. 

(i) For every integer k, k kills G if and only if m \ k. 

(ii) If G has Unite order, then m divides |G|. 

(Hi) If m f 0, then for every a e G, the order of a is Unite and divides m. 

(iv) If G is cyclic, then the exponent of G is 0 if G is infinite, and is \G\ if G 
is finite. 

Proof. Exercise. □ 

Theorem 6.40. If G\ and G3 are abelian groups of exponents mi and m 2 , then the 
exponent of Gi x G 2 is lcm(/ni, mi). 


Proof. Exercise. □ 
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Example 6.64. The additive group Z has exponent 0. □ 

Example 6.65. The additive group Z„ has exponent n. □ 

Example 6.66. The additive group Z„, x Z„ 2 has exponent lcm(«i , nf). □ 

Example 6.67. The multiplicative group Z* 5 has exponent 4 (see Example 6.26). □ 

The next two theorems develop some crucial properties about the structure of 
finite abelian groups. 

Theorem 6.41. If an abelian group G has non-zero exponent m, then G contains 
an element of order m. In particular, a Unite abelian group is cyclic if and only if 
its order equals its exponent. 

Proof. The second statement follows immediately from the first. For the first state- 
ment, let m = p/ be the prime factorization of m. 

First, we claim that for each i = 1, . . . , r, there exists a , e G such that (m / f 
('V;. Suppose the claim were false: then for some i, ( m/pi)a = Of, for all a e G: 
however, this contradicts the minimality property in the definition of the exponent 

m. That proves the claim. 

Fet a\, . . . , a r be as in the above claim. Then by Theorem 6.37, {m/p e /)ai has 
order p e / for each i = Finally, by Theorem 6.38, the group element 

fn/p\')ai + • • • + ( m/p e r r )a r 

has order m. □ 

Theorem 6.42. Let G be a finite abelian group of order n. If p is a prime dividing 

n, then G contains an element of order p. 

Proof. We can prove this by induction on n. 

If n = 1, then the theorem is vacuously true. 

Now assume n > I and that the theorem holds for all groups of order strictly less 
than n. Let a be any non-zero element of G, and let m be the order of a. Since a is 
non- zero, we must have m > 1. If p \ m, then ( m/p)a is an element of order p, and 
we are done. So assume that p \ m and consider the quotient group G/H, where H 
is the subgroup of G generated by a. Since H has order m, G/H has order n/m, 
which is strictly less than n, and since p \ m, we must have p \ {n/m). So we can 
apply the induction hypothesis to the group G/H and the prime p, which says that 
there is an element b e G such that the coset [/?]// e G/H has order p. If I is the 
order of b, then lb = Og, and so lb = Of, (mod IP), which implies that the order of 
[b]n divides I. Thus, p \ I, and so {£ /p)b is an element of G of order p. □ 


As a corollary, we have: 
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Theorem 6.43. Let G be a finite abelian group. Then the primes dividing the 
exponent of G are the same as the primes dividing its order. 

Proof. Since the exponent divides the order, every prime dividing the exponent 
must divide the order. Conversely, if a prime p divides the order, then since there 
is an element of order p in the group, the exponent must be divisible by p. □ 

Exercise 6.38. Findai,a 2 e Z* 5 such that Zj 5 = {ai^af). 

Exercise 6.39. Show that Q* is not finitely generated. 

Exercise 6.40. Let G be an abelian group, a e G, and m e Z, such that m > 0 
and ma = Of, . Let m = /; “j 1 • • • pf be the prime factorization of m. For i = 1, r, 

f. 

let /, be the largest non-negative integer such that /, < e, and m/p i ' • a = Of,. 

Qt ft Q f 

Show that the order of a is equal to p l ' . 

Exercise 6.41. Let G be an abelian group of order n, and let m be an integer. 
Show that mG = G if and only if gcd(n?, n) = 1. 

Exercise 6.42. Let H be a subgroup of an abelian group G. Show that: 

(a) if H and G/H are both finitely generated, then so is G; 

(b) if G is finite, gcd(|Ef |, \G/H\) = 1, and H and G/H are both cyclic, then 
G is cyclic. 

Exercise 6.43. Let G be an abelian group of exponent m\m 2 , where m\ and m 2 
are relatively prime. Show that G is the internal direct product of m \ G and miG. 

Exercise 6.44. Show how Theorem 2.40 easily follows from Theorem 6.32. 

Exercise 6.45. As additive groups, Z is clearly a subgroup of Q. Consider the 
quotient group G := Q/Z, and show that: 

(a) all elements of G have finite order; 

(b) G has exponent 0; 

(c) for all positive integers m, we have mG = G and G[m] = Z m ; 

(d) all finite subgroups of G arc cyclic. 

Exercise 6.46. Suppose that G is an abelian group that satisfies the following 
properties: 

(i) for all m e Z, G{m \ is either equal to G or is of finite order; 

(ii) for some meZ, )0g} C G[m} CG. 

Show that G{m] is finite for all non-zero m € Z. 
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6.6 The structure of finite abelian groups (*) 

We next state a theorem that classifies all finite abelian groups up to isomorphism. 

Theorem 6.44 (Fundamental theorem of finite abelian groups). A finite abelian 
group (with more than one element) is isomorphic to a direct product of cyclic 
groups 

7L p ‘\ x • • • x Tj p y , 

where the p, ’s are primes (not necessarily distinct) and the e, ’s are positive integers. 
This direct product of cyclic groups is unique up to the order of the factors. 

An alternative statement of this theorem is the following: 

Theorem 6 . 45 . A finite abelian group (with more than one element) is isomorphic 
to a direct product of cyclic groups 

X • • • X Z nlf , 

where each m t > 1, and where for i = 1, . . . , t — 1, we have m, \ m i+ \. Moreover, 
the integers mi, ... , m, are uniquely determined, and m, is the exponent of the 
group. 

The statements of these theorems are much more important than their proofs, 
which are a bit technical. Even if the reader does not study the proofs, he is urged 
to understand what the theorems actually say. 

In an exercise below, you arc asked to show that these two theorems are equiv- 
alent. We now prove Theorem 6.45, which we break into two lemmas, the first of 
which proves the existence paid of the theorem, and the second of which proves the 
uniqueness part. 

Lemma 6 . 46 . A finite abelian group (with more than one element) is isomorphic 
to a direct product of cyclic groups 

Tj,ni x • • • X Z nlf , 

where each m, > I . and where for i = 1, . . . , t — 1, we have m t \ m i+ \ ; moreover, 
m, is the exponent of the group. 

Proof. Let G be a finite abelian group with more than one element, and let m be 
the exponent of G. By Theorem 6.41, there exists an element a e G of order m. 
Let A = (a). Then A = Z m . Now, if A = G, the lemma is proved. So assume that 
ACG. 

We will show that there exists a subgroup B of G such that G — A + B and 
A n B = { (V, } . Lrom this, Theorem 6.25 gives us an isomorphism of G with 
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A x B. Moreover, the exponent of B is clearly a divisor of m, and so the lemma 
will follow by induction (on the order of the group). 

So it suffices to show the existence of a subgroup B as above. We prove this by 
contradiction. Suppose that there is no such subgroup, and among all subgroups 
B such that A D B = { ()<-, } , assume that B is maximal, meaning that there is 
no subgroup B' of G such that B C B' and A n B' = {Og}- By assumption 
C := A + B C G. 

Let d be any element of G that lies outside of C. Consider the quotient group 
G/C , and let r be the order of \d\c 6 G/C. Note that r > 1 and r \ m. We shall 
define a group element d' with slightly nicer properties than d, as follows. Since 
rd e C, we have rd = sa + b for some s e Z and b e B. We claim that r \ s. To see 
this, note that Og = rnd = (m/r)rd = ( m/r)sa + ( m/r)b , and since A n B = {Og}, 
we have ( m/r)sa = Og, which can only happen if r \ s. That proves the claim. 
This allows us to define d' := d — ( s/r)a . Since d = d' (mod C), we see not only 
that [d']c £ G/C has order r, but also that rd' e B. 

We next show that A n (B + id')) = {Og}, which will yield the contradiction 
we seek, and thus prove the lemma. Because A n B = {Og}, it will suffice to show 
that An(fi + (d')) C B. Now, suppose we have a group element b' + xd! e A, with 
b f € B and x e Z. Then in particular, xd' e C, and so r \ x, since [d']c £ G/C has 
order r. Further, since rd' e B , we have xd' e B, whence b' + xd' e B. □ 

Lemma 6.47. Suppose that G := Z mi x • • • x Z„, f and H := Z M| x • • • x Z„ ( 
are isomorphic, where the m, ’s and «, ’s are positive integers (possibly 1 ) such that 
np | m i+ 1 and n, | m + 1 for i = 1, . . . , t — 1. Then m, = n ,• for i = 1, . . . , t. 

Proof. Clearly, ]~[ ( . m, = \G\ = |//| = ]~[ ( . We prove the lemma by induction on 
the order of the group. If the group order is 1, then clearly all the m/s and n/s must 
be 1, and we are done. Otherwise, let p be a prime dividing the group order. Now, 
suppose that p divides m r , . . . , m t but not m \ , . . . , m,._ i, and that p divides n s ,...,n t 
but not mi, ... , n s -i, where r < t and s < t. Evidently, the groups pG and pH arc 
isomorphic. Moreover, 

pG = Z m] X • • • X Z mr _, X Z m r /p X • • • X Z m,/p, 

and 

pH = Z B1 X • • • X z„ s _, X Z n s /p X • • • X Z„ t/P . 

Thus, we see that \pG\ = \G\/p r ~ r+{ and \pH\ = | H \/p'~ s+l , from which it follows 
that r = s, and the lemma then follows by induction. □ 

Exercise 6.47. Show that Theorems 6.44 and 6.45 are equivalent; that is, show 
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that each one implies the other. To do this, give a natural one-to-one correspond- 
ence between sequences of prime powers (as in Theorem 6.44) and sequences of 
integers m\ m t (as in Theorem 6.45). 

Exercise 6.48. Using the fundamental theorem of finite abelian groups (either 
form), give short and simple proofs of Theorems 6.41 and 6.42. 

Exercise 6.49. In our proof of Euler’s criterion (Theorem 2.21), we really only 
used the fact that Z* has a unique element of multiplicative order 2. This exercise 
develops a proof of a generalization of Euler’s criterion, based on the fundamental 
theorem of finite abelian groups. Suppose G is an abelian group of even order n 
that contains a unique element of order 2. 

(a) Show that G = 7Li e x Z m , x • • • x Z mjt , where e > 0 and the mf s are odd 
integers. 

(b) Using part (a), show that 2 G = G{n/ 2}. 

Exercise 6.50. Let G be a non-trivial, finite abelian group. Let s be the smallest 
positive integer such that G = (a\, ... ,a s ) for some a\,...,a s e G. Show that s 
is equal to the value of t in Theorem 6.45. In particular, G is cyclic if and only if 
t = 1. 

Exercise 6.51. Suppose G = Z m , x • • • x Z m; . Let p be a prime, and let s be the 
number of mf s divisible by p. Show that G{p} = Z p S . 

Exercise 6.52. Suppose G = Z mi x • • • x Z„ 1( with m, | m i+ \ for i = 1 1, 
and that H is a subgroup of G. Show that H = Z„, x • • • x Z„ ( , where n, \ n i+ \ for 
i = 1, . . . , t — 1 and n, \ ntj for i = 1 ,... ,t. 

Exercise 6.53. Suppose that G is an abelian group such that for all m > 0, 
we have mG = G and \G { m } | = nr (note that G is not finite). Show that 
G { m } = Z m x Z,„ for all m > 0. Hint: use induction on the number of prime 
factors of m. 
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This chapter introduces the notion of a ring, more specifically, a commutative ring 
with unity. While there is a lot of terminology associated with rings, the basic ideas 
arc fairly simple. Intuitively speaking, a ring is an algebraic structure with addition 
and multiplication operations that behave as one would expect. 

7.1 Definitions, basic properties, and examples 

Definition 7.1. A commutative ring with unity is a set R together with addition 
and multiplication operations on R, such that: 

(i) the set R under addition forms an abelian group, and we denote the additive 
identity by Or; 

(ii) multiplication is associative; that is, for all a,b,c e R, we have a{bc) = 
( ab)c ; 

(iii) multiplication distributes over addition; that is, for all a, b, c e R, we have 
a(b + c) = ab + ac and ( b + c)a = ba + ca; 

(iv) there exists a multiplicative identity; that is, there exists an element I r e R, 
such that \r - a = a = a ■ \r for all a e R; 

(v) multiplication is commutative; that is, for all a, h e R, we have ab = ba. 

There are other, more general (and less convenient) types of rings — one can 
drop properties (iv) and (v), and still have what is called a ring. We shall not, 
however, be working with such general rings in this text. Therefore, to simplify 
terminology, from now on, by a “ring,” we shall always mean a commutative 
ring with unity. 

Let R be a ring. Notice that because of the distributive law, for any fixed a e R. 
the map from R to R that sends b e R to ab e R is a group homomorphism with 
respect to the underlying additive group of R. We call this the ^-multiplication 
map. 
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We first state some simple facts: 

Theorem 7.2. Let R be a ting. Then: 

(i) the multiplicative identity 1r is unique; 

(ii) Or ■ a = Or for all a e R; 

(iii) (- a)b = —{ab ) = a{—b) for all a,be R; 

(iv) (- a){—b ) = a b for all a,b e R; 

(v) ( ka)b = k{ab) = a{kb ) for all k € Z and a.beR. 

Proof. Part (i) may be proved using the same argument as was used to prove 
part (i) of Theorem 6.2. Parts (ii), (iii), and (v) follow directly from parts (i), 
(ii), and (iii) of Theorem 6.19, using appropriate multiplication maps, discussed 
above. Paid (iv) follows from part (iii), along with paid (iv) of Theorem 6.3: 
(— a)(—b) = -( a{—b )) = -(-( ab )) = ab. □ 

Example 7.1. The set Z under the usual rules of multiplication and addition forms 
a ring. □ 

Example 7.2. For n > 1 , the set Z„ under the rules of multiplication and addition 
defined in §2.5 forms a ring. □ 

Example 7.3. The set © of rational numbers under the usual rules of multiplication 
and addition forms a ring. □ 

Example 7.4. The set M of real numbers under the usual rules of multiplication 
and addition forms a ring. □ 

Example 7.5. The set C of complex numbers under the usual rules of multiplica- 
tion and addition forms a ring. Every a e C can be written (uniquely) as a = a+bi, 
where a, b e R and i = V—l. If a' = a' + b’i is another complex number, with 
a', b' e M, then 

a + a r = (a + a') + (b + b')i and aa' = {aci — bb’) + ( ab ' + a'b)i. 

The fact that C is a ring can be verified by direct calculation; however, we shall see 
later that this follows easily from more general considerations. 

Recall the complex conjugation operation, which sends a to a := a - bi. One 
can verify by direct calculation that complex conjugation is both additive and mul- 
tiplicative; that is, a + a' = a + a' and a ■ a' = a ■ a’. 

The norm of a is N(a ) := a a = a 2 + b 2 . So we see that N(a ) is a non-negative 
real number, and is zero if and only if a = 0. Moreover, from the multiplicativity 
of complex conjugation, it is easy to see that the norm is multiplicative as well: 
N(aa') = aa' aa' = aa'aa' = aaa'a' = N{a)N{a'). □ 
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Example 7.6. Consider the set F of all arithmetic functions, that is, functions 
mapping positive integers to reals. Let us define addition of arithmetic functions 
point-wise (i.e., (/ + g){n) = f{n) + g(n ) for all positive integers n) and multi- 
plication using the Dirichlet product, introduced in §2.9. The reader should verify 
that with addition and multiplication so defined, F forms a ring, where the all-zero 
function is the additive identity, and the special function I defined in §2.9 is the 
multiplicative identity. □ 

Example 7.7. Generalizing Example 6.18, if R\ Rk are rings, then we can 

form the direct product S := R\ x • • • x R k , which consists of all k -tuples 
with e Ri,...,ak e Rk . We can view S in a natural way as 
a ring, with addition and multiplication defined component- wise. The additive 

identity is (0^,, 0 R k ) and the multiplicative identity is (1r 15 . . . , I r I: ). When 

R = Ri = • ■ • = Rk, the 7-wise direct product of R is denoted R xk . □ 

Example 7.8. Generalizing Example 6.19, if / is an arbitrary set and R is a ring, 
then Map(/, R), which is the set of all functions / : I -» R, may be natu- 
rally viewed as a ring, with addition and multiplication defined point-wise: for 
/, g 6 Map(7, R), we define 

(/ + g)( 0 ■= f(i) + g(i ) and (/ • g)(i) := f(i ) • g(i) for all i e I. 

We leave it to the reader to verify that Map(/, R) is indeed a ring, where the addi- 
tive identity is the all-zero function, and the multiplicative identity is the all-one 
function. □ 

A ring R may be trivial, meaning that it consists of the single element Or, with 
Or+Or = Or and 0r-0r = Or. Certainly, if R is trivial, then I r = Or. Conversely, 
if 1 k = Or. then for all a e R, we have a = Ir ■ a = Or ■ a = Or, and hence R 
is trivial. Trivial rings are not very interesting, but they naturally arise in certain 
constructions. 

For a\,...,ak £ R, the product a\ ■ ■ ■ ak needs no parentheses, because mul- 
tiplication is associative; moreover, we can reorder the a,-’ s without changing the 
value of the product, since multiplication is commutative. We can also write this 
product as Hf =1 a i ■ convention, such a product is defined to be 1 r when 7 = 0. 
When o = d] = • • ■ = at, we can write this product as a k . The reader may verify 
the usual power laws: for all a,b e R, and all non-negative integers k and l, we 
have 


(a e ) k = a kl = (, a k ) e , a k+e = a k a l , {ab) k = a k b k . 


(7.1) 
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For all a k , e R, the distributive law implies 

(«! + ••• + a k )(b\ + ■■■ + b e ) = ^ ajbj. 

1 <i<k 
1 <j<t 

A ring R is in particular an abelian group with respect to addition. We shall call 
a subgroup of the additive group of R an additive subgroup of R. The charac- 
teristic of R is defined as the exponent of this group (see §6.5). Note that for all 
m e Z and a e R, we have 

ma = m( 1« • a) = (m ■ l^)a, 

so that if m ■ 1« = Or, then ma = Or for all a e R. Thus, if the additive order of 
I r is infinite, the characteristic of R is zero, and otherwise, the characteristic of R 
is equal to the additive order of 1 r. 

Example 7.9. The ring Z has characteristic zero, Z„ has characteristic n, and 
Z„j x Z„ 2 has characteristic lcm(«i,« 2 )- □ 

When there is no possibility for confusion, one may write “0” instead of “Or” 
and “1” instead of “1 r.” Also, one may also write, for example, 2 r to denote 2-1 r, 
3 r to denote 3 • 1 r, and so on; moreover, where the context is clear, one may use 
an implicit “type cast,” so that me Z really means m ■ \r. 

Exercise 7.1 . Show that the familial - binomial theorem (see §A2) holds in an 
arbitrary ring R: that is, for all a,b e R and every positive integer n, we have 

(a + br = j^( n \^- k b k . 

k=o ' ' 

Exercise 7.2 . Let R be a ring. For additive subgroups A and B of R, we 
define their ring-theoretic product AB as the set of all elements of R that can 
be expressed as 

a\b\ 1- a k b k 

for some a \, . . . , a k e A and b\ b k e B\ by definition, this set includes the 

“empty sum” Or. Show that for all additive subgroups A, B, and C of R: 

(a) AB is also an additive subgroup of R\ 

(b) AB = BA; 

(c) A(BC ) = ( AB)C ; 

(d) A(B + C) = AB + AC. 
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7.1.1 Divisibility, units, and fields 

For elements a,b in a ring R , we say that a divides b if ar = b for some r e R. If 
a divides b, we write a \ b, and we may say that a is a divisor of b, or that b is a 
multiple of a, or that b is divisible by a. If a does not divide b, then we write a \ b. 
Note that Theorem 1.1 holds for an arbitrary ring. 

We call a e R a unit if a | 1r, that is, if ar = 1# for some r e R. Using the 
same argument as was used to prove part (ii) of Theorem 6.2, it is easy to see that r 
is uniquely determined; it is called the multiplicative inverse of a, and we denote 
it by a -1 . Also, for b e R. we may write b/a to denote ba~ l . Evidently, if a is a 
unit, then a \ b for every b e R. 

We denote the set of units by R* . It is easy to see that I « e R*. Moreover, 
R* is closed under multiplication; indeed, if a and b are elements of R*, then 
(ab)~ l = a~ l b~ l . It follows that with respect to the multiplication operation of 
the ring, R* is an abelian group, called the multiplicative group of units of R. 
If a e R* and k is a positive integer, then a k e R* ; indeed, the multiplicative 
inverse of a k is (a~ l ) k , which we may also write as a~ k (which is consistent with 
our notation for abelian groups). For all a.be R*. the identities (7.1) hold for all 
integers k and i. 

If R is non-trivial and every non-zero element of R has a multiplicative inverse, 
then R is called a field. 

Example 7.10. The only units in the ring Z are ±1. Hence, Z is not a field. □ 

Example 7.11. Let n be a positive integer. The units in Z„ arc the residue classes 
\a\ n with gcd(a, n) = 1. In particular, if n is prime, all non-zero residue classes arc 
units, and if n is composite, some non-zero residue classes arc not units. Hence, Z„ 
is a field if and only if n is prime. The notation Z* introduced in this section for the 
group of units of the ring Z„ is consistent with the notation introduced in §2.5. □ 

Example 7.12. Every non- zero element of Q is a unit. Hence, O is a field. □ 

Example 7.13. Every non- zero element of M is a unit. Hence, M is a field. □ 

Example 7.14. For non-zero a = a + bi e C, with a. b £ M, we have c := N(a) = 
a 2 + b 2 > 0. It follows that the complex number ac~ l = (ac -1 ) + {—bc~ l )i is the 
multiplicative inverse of a, since a ■ ac~ { = (aa)c -1 = 1. Hence, every non-zero 
element of C is a unit, and so C is a field. □ 

Example 7.15. For rings R \, . . . , R^, it is easy to see that the multiplicative group 
of units of the direct product R\ x • • • x R k is equal to R* x • • • x R*. Indeed, by 
definition, (ar, . . . , ak) has a multiplicative inverse if and only if each individual a, 
does. □ 
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Example 7.16. If I is a set and R is a ring, then the units in Map(7, R) are those 
functions f : I —> R such that /(/) e R* for all i e I. □ 

Example 7.1 7. Consider the ring T of arithmetic functions defined in Example 7.6. 
By the result of Exercise 2.54, F* = [f e F : f{ 1)^0}. □ 

7.1.2 Zero divisors and integral domains 

Let R be a ring. If a and b are non-zero elements of R such that ab = 0, then a 
and b are both called zero divisors. If R is non-trivial and has no zero divisors, 
then it is called an integral domain. Note that if a is a unit in R, it cannot be a 
zero divisor (if ab = 0, then multiplying both sides of this equation by a~ x yields 
b = 0). In particular, it follows that every field is an integral domain. 

Example 7.18. Z is an integral domain. □ 

Example 7.19. For n > 1, Z„ is an integral domain if and only if n is prime. In 
particular, if n is composite, so n = ab with 1 < a < n and 1 < b < n, then [n]„ 
and \b\ n are zero divisors: \a\ n \b\ n = [0],„ but \a\ n f [0]„ and \b\ n f [0]„. □ 

Example 7.20. Q, M, and C are fields, and hence are also integral domains. □ 

Example 7.21. For two non-trivial rings R\, R 2 , an element {a\, 02 ) e f?i x fo is 
a zero divisor if and only if a\ is a zero divisor, a 2 is a zero divisor, or exactly one 
of a 1 or ai is zero. In particular. R \ x R 2 is not an integral domain. □ 

The next two theorems establish certain results that are analogous to familiar 
facts about integer divisibility. These results hold in a general ring, provided one 
avoids zero divisors. The first is a cancellation law: 

Theorem 7.3. If R is a ring, and a.b.ceR such that a f 0 and a is not a zero 
divisor, then ab = ac implies b = c. 

Proof, ab = be implies a(b — c) = 0. The fact that a f 0 and a is not a zero divisor 
implies that we must have b - c = 0, and so b = c. □ 

Theorem 7.4. Let R be a ring. 

(i) Suppose a,beR, and that either a or b is not a zero divisor. Then a \ b 
and b \ a if and only if ar = b for some r e R*. 

(ii) Suppose a, b e R, a \ b, a 0. and a is not a zero divisor. Then there 
exists a unique r e R such that ar = b, which we denote by b/ a. 

Proof. For the first statement, if ar = b for some r e R*, then we also have 
br~ l = a\ thus, a \ b and b \ a. For the converse, suppose that a \ b and b \ a. We 
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may assume that b is not a zero divisor (otherwise, exchange the roles of a and b). 
We may also assume that b is non-zero (otherwise, b \ a implies a = 0, and so the 
conclusion holds with any r). Now, a \ b implies ar = b for some r e R, and b \ a 
implies hr' = a for some r' e R, and hence b = ar = br'r. Canceling b from both 
sides of the equation b = br'r , we obtain 1 = r'r, and so r is a unit. 

For the second statement, a \ b means ar = b for some r e R. Moreover, this 
value of r is unique: if ar = b = ar' , then we may cancel a , obtaining r = r' . □ 

Of course, in the previous two theorems, if the ring is an integral domain, then 
there are no zero divisors, and so the hypotheses may be simplified in this case, 
dropping the explicit requirement that certain elements arc not zero divisors. In 
particular, if a , b, and c arc elements of an integral domain, such that ah = ac and 
a ^ 0, then we can cancel a, obtaining b = c. 

The next two theorems state some facts which pertain specifically to integral 
domains. 

Theorem 7.5. The characteristic of an integral domain is either zero or a prime. 

Proof. By way of contradiction, suppose that D is an integral domain with char- 
acteristic m that is neither zero nor prime. Since, by definition, D is not a trivial 
ring, we cannot have m = 1, and so m must be composite. Say m = si. where 
1 < s < m and 1 < t < m. Since m is the additive order of 1/), it follows that 
( s ■ Id) f 0 d and (t- Id) 7^ 0d- moreover, since D is an integral domain, it follows 
that ( s ■ 1 d)(1 • Id) ^ 0d- So we have 

0d = m ■ Id = 00 • Id = 0 • 1 d) 0 ■ In) ± 0 d, 

a contradiction. □ 

Theorem 7.6. Every finite integral domain is a held. 

Proof. Let D be a finite integral domain, and let a be any non-zero element of 
D. Consider the ^-multiplication map that sends b e D to ab, which is a group 
homomorphism on the additive group of D. Since a is not a zero-divisor, it follows 
that the kernel of the ^-multiplication map is {0 d}, hence the map is injective, and 
by finiteness, it must be surjective as well. In particular, there must be an element 
b e D such that ab = Id- □ 

Theorem 7.7. Every hnite held F must be of cardinality p"’ . where p is prime, w 
is a positive integer, and p is the characteristic of F. 

Proof. By Theorem 7.5, the characteristic of F is either zero or a prime, and since 
F is finite, it must be prime. Let p denote the characteristic. By definition, p is 
the exponent of the additive group of F, and by Theorem 6.43, the primes dividing 
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the exponent arc the same as the primes dividing the order, and hence F must have 
cardinality p"' for some positive integer w. □ 

Of course, for every prime p, 7L P is a finite field of cardinality p. As we shall 
see later (in Chapter 19), for every prime p and positive integer w, there exists a 
field of cardinality p"'. Later in this chapter, we shall see some specific examples 
of finite fields of cardinality p 2 (Examples 7.40, 7.59, and 7.60). 

Exercise 7.3 . Let R be a ring, and let a.beR such that ab 0. Show that ab is 
a zero divisor if and only if a is a zero divisor or b is a zero divisor. 

Exercise 7.4. Suppose that R is a non-trivial ring in which the cancellation law 
holds in general: for all a, b, c e R, if a ± 0 and ab = ac, then b = c. Show that R 
is an integral domain. 

Exercise 7.5. Let R be a ring of characteristic m > 0, and let n be an integer. 
Show that: 

(a) if gcd(«, m) = 1, then n ■ 1# is a unit; 

(b) if 1 < gcd(n, m) < m, then n ■ 1« is a zero divisor; 

(c) otherwise, n ■ 1r = 0. 

Exercise 7.6. Let D be an integral domain, m e Z, and a e D. Show that 
ma = 0 if and only if m is a multiple of the characteristic of D or a = 0. 

Exercise 7.7. Show that for all n > 1, and for all a, b e Z„, if a \ b and b \ a , 
then ar = b for some r e Z*. Hint: this result does not follow from paid (i) of 
Theorem 7.4, as we allow a and b to be zero divisors here; first consider the case 
where n is a prime power. 

Exercise 7.8. Show that the ring F of arithmetic functions defined in Exam- 
ple 7.6 is an integral domain. 

Exercise 7.9. This exercise depends on results in §6.6. Using the fundamental 
theorem of finite abelian groups, show that the additive group of a finite field of 
characteristic p and cardinality p" is isomorphic to Z p w . 


7.1.3 Subrings 

Definition 7.8. A subset S of a ring R is called a subring if 

(i) S is an additive subgroup of R, 

(ii) S is closed under multiplication, and 
(Hi) I R 6 S. 
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It is clear that the operations of addition and multiplication on a ring R make 
a subring S of R into a ring, where Or is the additive identity of S and I r is the 
multiplicative identity of S. One may also call R an extension ring of S. 

Some texts do not require that I r belongs to a subring S, and instead require 
only that S contains a multiplicative identity, which may be different than that of 
R. This is perfectly reasonable, but for simplicity, we restrict ourselves to the case 
where 1 r € S. 

Expanding the above definition, we see that a subset S of R is a subring if and 
only if 1# e S and for all a, b e S, we have 

ci + b £ S , — a £ S , and cib £ S. 

In fact, to verify that S' is a subring, it suffices to show that -Ir e S and that S is 
closed under addition and multiplication; indeed, if - Ir 6 S and S is closed under 
multiplication, then S is closed under negation, and further, Ir = -(-Ir) e S. 

Example 7.22. Z is a subring of Q. □ 

Example 7.23. Q is a subring of M. □ 

Example 7.24. M is a subring of C. Note that for all a := a+ bi e C, with a. b e S’, 
we have a = a <£=5> a + bi = a - bi b = 0. That is, a = a <=> uel □ 

Example 7.25. The set Z[/| of complex numbers of the form a + bi, with a,be Z, 
is a subring of C. It is called the ring of Gaussian integers. Since C is a field, it 
contains no zero divisors, and hence Z[i] contains no zero divisors either. Hence, 
Z|/| is an integral domain. 

Let us determine the units of Z [/]. Suppose a e Z [/] is a unit, so that there exists 
a ' e Z|/| such that aa' = 1. Taking norms, we obtain 

1 = JV(1) = N(aa') = N(a)N(a'). 

Since the norm of any Gaussian integer is itself a non-negative integer, and since 
N(a)N(a') = 1, we must have N(a) = 1. Now, if a = a + bi, with a,b e Z, then 
1 = N(a) = a 1 + b 2 , which implies that a = ±1 or a = ±i. Conversely, it is easy 
to see that ±1 and ±; are indeed units, and so these are the only units in Z [*]. □ 

Example 7.26. Let m be a positive integer, and let 0 <m) be the set of rational 
numbers which can be written as a/b, where a and b arc integers, and b is rela- 
tively prime to m. Then Q (m) is a subring of Q, since for all a,b,c,d e Z with 
ged (b, m) = 1 and gcd(d, m) = 1, we have 

a c ad + be a c ac 

- -I — = and ■ — , 

b d bd b d bd 

and since ged (bd, m) = 1, it follows that the sum and product of any two elements 
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of Q im) are again in Q (m) . Clearly, contains — 1, and so it follows that Q (m) is 
a subring of Q. The units of Q (m) are precisely those rational numbers of the form 
a/b, where gcd(n, m) = gcd(6, m) = 1. □ 

Example 7.27. Suppose R is a non-trivial ring. Then the set { 0 # } is not a subring 
of R : although it satisfies the first two requirements of the definition of a subring, 
it does not satisfy the third. □ 

Generalizing the argument in Example 7.25, it is clear that every subring of an 
integral domain is itself an integral domain. However, it is not the case that a 
subring of a field is always a field: the subring Z of Q is a counter-example. If F' 
is a subring of a field F, and F' is itself a field, then we say that F' is a subfield of 
F, and that F is an extension field of F' . For example, O is a subfield of M, which 
in turn is a subfield of C. 


Exercise 7.10 . Show that if S' is a subring of a ring R , then a set T C S’ is a 
subring of R if and only if T is a subring of S. 

Exercise 7.11 . Show that if S and T are subrings of R, then so is S n T. 

Exercise 7.12 . Let Si be a subring of R\, and .ST a subring of Ro. Show that 
Si x S 2 is a subring of R\ x R 2 . 

Exercise 7.13. Suppose that S and T are subrings of a ring R. Show that their 
ring-theoretic product ST (see Exercise 7.2) is a subring of R that contains S U T, 
and is the smallest such subring. 

Exercise 7.14. Show that the set Q[/| of complex numbers of the form a + bi, 
with a, b € Q, is a subfield of C. 

Exercise 7.15. Consider the ring Map(M, M) of functions / : R -» M, with 
addition and multiplication defined point-wise. 

(a) Show that Map(M, M) is not an integral domain, and that Map(M, M)* con- 
sists of those functions that never vanish. 

(b) Let a, b e Map(M, M). Show that if a \ b and b \ a, then ar = b for some 
r e Map(M,M)*. 

(c) Let C be the subset of Map(M, M) of continuous functions. Show that C is 
a subring of Map(M, M), and that all functions in C* arc either everywhere 
positive or everywhere negative. 

(d) Find elements a,b e C, such that in the ring C, we have a \ b and b \ a, yet 
there is no r e C* such that ar = b. 
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7.2 Polynomial rings 

If R is a ring, then we can form the ring of polynomials i?[X], consisting of 
all polynomials g = ao + a\X + ■ ■ ■ + akX k in the indeterminate, or “formal” 
variable, X, with coefficients a, in R, and with addition and multiplication defined 
in the usual way. 

Example 7.28. Let us define a few polynomials over the ring Z: 

a := 3 + X 2 , b := 1 + 2X — X 3 , c := 5, d := 1 + X, e := X, / := 4X 3 . 

We have 

a + b = 4 + 2X + X 2 -X 3 , a-b = 3 + 6X + X 2 -X 3 -X 5 , cd+ef = 5 + 5X + 4X 4 . □ 

As illustrated in the previous example, elements of R arc also considered to be 
polynomials. Such polynomials arc called constant polynomials. The set R of 
constant polynomials forms a subring of R[X ]. In particular, Or is the additive 
identity in -R[X] and I r is the multiplicative identity in R[X], Note that if R is the 
trivial ring, then so is R[X]: also, if' R is a subring of E. then R[X] is a subring of 
E[X]. 

So as to keep the distinction between ring elements and indeterminates clear, we 
shall use the symbol “X” only to denote the latter. Also, for a polynomial g e i?[X], 
we shall in general write this simply as “g,” and not as “g(X).” Of course, the 
choice of the symbol “X” is arbitrary; occasionally, we may use another symbol, 
such as “7,” as an alternative. 


7.2.1 Formalities 

For completeness, we present a more formal definition of the ring i?[X]. The 
reader should hear in mind that this formalism is rather tedious, and may be more 
distracting than it is enlightening. Formally, a polynomial g e R\X ] is an infinite 
sequence {a/}°^ 0 , where each a , e R , but only finitely many of the a, ’s are non- 
zero (intuitively, a, represents the coefficient of X 1 ). For each non-negative integer 
j, it will be convenient to define the function ej : R — > i?[X] that maps c e R to 
the sequence {c,}°l 0 e R[X], where cj := c and c, := Or for i ± j (intuitively, 
£j{c) represents the polynomial cX 7 ). 

For 

g = { fl/ }“ 0 e R[X] and h = {M£o e R[X], 
g + h:= {5,}“ 0 and gh := {p/}“ 0 > 


we define 



7.2 Polynomial rings 


111 


where for /' = 0, 1, 2, , 

Sj := a, + bi (7.2) 

and 

Pi '■= X ajbk ’ ( 7 -3) 

i—j+k 

the sum being over all pairs (J, k) of non-negative integers such that i = j + k 
(which is a finite sum). We leave it to the reader to verify that g + h and gh arc 
polynomials (i.e., only finitely many of the s,’s and p,’s are non-zero). The reader 
may also verify that all the requirements of Definition 7. 1 are satisfied: the additive 
identity is the all-zero sequence £o(Or), and the multiplicative identity is £o( I r)- 
One can easily verify that for all c,d e R, we have 

£o(c + d) = £o(c) + £o (d) and £o(cd) = £o(c)£o(d). 

We shall identify c € R with £o(c) £ /^ [ X ] , viewing the ring element c as simply 
“shorthand” for the polynomial £o(c) in contexts where a polynomial is expected. 
Note that while c and £o(c) are not the same mathematical object, there will be no 
confusion in treating them as such. Thus, from a narrow, legalistic point of view, R 
is not a subring of i?[X], but we shall not let such annoying details prevent us from 
continuing to speak of it as such. Indeed, by appropriately renaming elements, we 
can make R a subring of in the literal sense of the term. 

We also define X := £i ( lj?). One can verify that X 1 = £,(1r) for all i > 0. 
More generally, for any polynomial g = { a , } “ 0 , if a , = Or for all i exceeding 
some value k, then we have g = Xf= 0 £o ( a i)X‘- Writing a t in place of £o (a,-), 
we have g = Xf=o a i^'- an ^ so we can return to the standard practice of writing 
polynomials as we did in Example 7.28, without any loss of precision. 


7.2.2 Basic properties of polynomial rings 

Let R be a ring. For non-zero g e i?[X], if g = Xf=o a i^‘ with ak f 0, then we call 
k the degree of g, denoted deg(g), we call ak the leading coefficient of g, denoted 
lc(g), and we call no the constant term of g. If lc(g) = 1, then g is called monic. 

Suppose g = Xf = o a i%‘ an d h = X;=o ‘ are polynomials such that ak f 0 and 
bf ^ 0, so that deg(g) = k and lc(g) = ak, and deg (h) = l and lc(/i) = b(. When 
we multiply these two polynomials, we get 

gh = a 0 b 0 + (a 0 bi + a\b 0 )X + • • • + a k b ( X k+e . 

In particular, deg (gh) < deg(g) + dcg(/i). If either of ag or bf arc not zero divisors, 
then agbe is not zero, and hence deg (gh) = deg(g) + deg (h). However, if both ag 
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and bf arc zero divisors, then we may have a^bf = 0, in which case, the product gh 
may be zero, or perhaps gh f 0 but deg (gh) < deg(g) + deg (h). 

For the zero polynomial, we establish the following conventions: its leading 
coefficient and constant term are defined to be Or, and its degree is defined to be 
-oo. With these conventions, we may succinctly state that 

for all g,h e i?[X], we have deg {gh) < deg(g) + deg (h), with 
equality guaranteed to hold unless the leading coefficients of both g 
and h are zero divisors. 

In particular, if the leading coefficient of a polynomial is not a zero divisor, then 
the polynomial is not a zero divisor. In the case where the ring of coefficients is an 
integral domain, we can be more precise: 

Theorem 7.9. Let D be an integral domain. Then: 

(i) for all g, h e D\ X ], we have deg (gh) = deg(g) + dcg(/i): 

(ii) D\X \ is an integral domain; 

(iii) (L)\X\)* = D*. 

Proof. Exercise. □ 

An extremely important property of polynomials is a division with remainder 
property, analogous to that for the integers: 

Theorem 7.10 (Division with remainder property). Let R be a ring. For all 
g,h e ,R[X] with h f 0 and lc (h) e R*, there exist unique q,r e -R[X] such that 
g = hq + r and deg(r) < deg(h). 

Proof. Consider the set S := {g — ht : t e L?[X] } . Let r = g — hq be an element 
of S of minimum degree. We must have deg(r) < deg (/?), since otherwise, we 
could subtract an appropriate multiple of h from r so as to eliminate the leading 
coefficient of r, obtaining 

r' := r-h- (lc(r) lc(/ 2 ) _1 A deg(r)_deg(/,) ) e A, 

where dcg(r') < deg(r), contradicting the minimality of deg(r). 

That proves the existence of r and q. For uniqueness, suppose that g = hq + r 
and g = hq' + r' , where deg(r) < dcg(/i) and deg(r') < deg(fi). This implies 
r' — r = h ■ (q — q'). However, if q f q', then 

deg (h) > deg (r' - r) = deg{h ■ (q - q')) = deg (h) + deg (q - q') > deg (h), 

which is impossible. Therefore, we must have q = q' , and hence r = r' . □ 

I f g = hq + r as in the above theorem, we define g mod h := r. Clearly, h \ g if 
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and only if g mod h = 0. Moreover, note that if deg(g) < deg(h), then q = 0 and 
r = g\ otherwise, if deg(g) > deg (h), then q ^ 0 and deg(g) = deg (h) + deg(< 7 ). 


7.2.3 Polynomial evaluation 

A polynomial g = Xf=o a >^‘ e ^[-^] naturally defines a polynomial function on R 
that sends x e R to Xf=o a t x ‘ e E, and we denote the value of this function as g(x) 
(note that “ X ” denotes an indeterminate, while “x” denotes an element of R ). It is 
important to regal'd polynomials over R as formal expressions, and not to identify 
them with their corresponding functions. In particular, two polynomials are equal 
if and only if their coefficients are equal, while two functions are equal if and only 
if their values agree at all points in R. This distinction is important, since there are 
rings R over which two different polynomials define the same function. One can 
of course define the ring of polynomial functions on R, but in general, that ring has 
a different structure from the ring of polynomials over R. 

Example 7.29. In the ring Z p , for prime p, by Fermat’s little theorem (Theo- 
rem 2.14), we have x !> = x for all x e Z p . However, the polynomials X p and 
X are not the same polynomials (in particular, the former has degree p, while the 
latter has degree 1). □ 

More generally, suppose R is a subring of a ring E. Then every polynomial 
g = Xf =0 a i^‘ e i?[X] defines a polynomial function from E to E that sends 
a e E to a i a ' £ E , and, again, the value of this function is denoted g(a). We 
say that a is a root of g if g(a ) = 0. 

An obvious, yet important, fact is the following: 

Theorem 7.11. Let R be a subring of a ring E. For all g.h e R\X ] and a e E, if 
s := g + h e .R[X] and p := gh e 1?[X], then we have 

s(a) = g(a) + h(a ) and p(a) = g(a)h(a). 

Also, if c e R is a constant polynomial, then c(a) = c for all a e E. 

Proof. The statement about evaluating a constant polynomial is clear from the 
definitions. The proof of the statements about evaluating the sum or product of 
polynomials is really just symbol pushing. Indeed, suppose g = JT a,X l and 
h = JL bjX 1 . Then s = £,( a i + bj)X l , and so 

s(a) = + bi)a l = ^ Oja' + ^ bjU 1 = g{a) + h{a). 

i i i 
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Also, we have 

P=(X a t X‘) ( 2 bjXj) = 2 a,bjX l+ >, 

i j iJ 

and employing the result for evaluating sums of polynomials, we have 

P (a) = ^ a i b j a ' +J = (^ a i a ‘)(^ b j aJ ) = g(«)h(a). □ 

•J i j 

Example 7.30. Consider the polynomial g := 2X 3 - 2X 1 + X — 1 e Z[X\. We can 
write g = (2X 2 + 1)(A - 1). For any element a of Z, or an extension ring of Z, we 
have g(a) = (2a 2 + l)(a - 1). From this, it is clear that in Z, g has a root only at 
1 ; moreover, it has no other roots in M, but in C, it also has roots ±i / s/2. □ 

Example 7.31. If E = i?[X], then evaluating a polynomial g e -R[X] at a point 
a e E amounts to polynomial composition. For example, if g := X 2 + X and 
a := X + 1, then 

g(a) = g( X + 1 ) = (X + l) 2 + (X + 1) = X 2 + 3X + 2. □ 

The reader is perhaps familial - with the fact that over the real or the complex 
numbers, every polynomial of degree k has at most k distinct roots, and the fact 
that every set of k points can be interpolated by a unique polynomial of degree less 
than k. As we will now see, these results extend to much more general, though not 
completely arbitrary, coefficient rings. 

Theorem 7.12. Let Rbe a ring, g e R[X], and x e R. Then there exists a unique 
polynomial q e R\X \ such that g = (X — x)q + g(x). In particular, x is a root of g 
if and only if (X — x) divides g. 

Proof. If R is the trivial ring, there is nothing to prove, so assume that R is non- 
trivial. Using the division with remainder property for polynomials, there exist 
unique q, r e L^[X] such that g = (X — x)q + r, with q, r e R[X ] and deg(r) < 1, 
which means that r e R. Evaluating at x, we see that g(x) = (x — x)q(x) + r = r. 
That proves the first statement. The second follows immediately from the first. □ 

Note that the above theorem says that X — x divides g — g(x), and the polynomial 
q in the theorem may be expressed (using the notation introduced in part (ii) of 
Theorem 7.4) as 


Theorem 7.13. Let D be an integral domain, and let x\ ,...,xg be distinct ele- 
ments of D. Then for every polynomial g e D[X], the elements x\,. ..,xg are 
roots of g if and only if the polynomial JJ* =1 (X - x, ) divides g. 
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Proof. One direction is trivial: if J^ =1 (X - x,) divides g, then it is clear that each 
Xj is a root of g. We prove the converse by induction on k. The base case k = 1 is 
just Theorem 7.12. So assume k > 1, and that the statement holds for k — 1. Let 
g e D[X] and let x\, . . . , x k be distinct roots of g. Since x k is a root of g , then by 
Theorem 7.12, there exists q e D\ X ] such that g = (X — x k )q. Moreover, for each 
i = 1, . . . , k — 1, we have 

0 = g(Xi) = (Xj - x k )q(xi), 

and since x, — x k f 0 and D is an integral domain, we must have q{x t ) = 0. Thus, 
q has roots x\, . . . , x k -\, and by induction fjj'T] 1 (X - x,) divides q, from which it 
then follows that JJ* =1 (X - x ( ) divides g. □ 

Note that in this theorem, we can slightly weaken the hypothesis: we do not need 
to assume that the coefficient ring is an integral domain; rather, all we really need 
is that for all i f j, the difference x, - x 7 is not a zero divisor. 

As an immediate consequence of this theorem, we obtain: 

Theorem 7.14. Let D be an integral domain, and suppose that g e D[X], with 
deg(g) = k > 0. Then g has at most k distinct roots. 

Proof. If g had k + 1 distinct roots xi, . . . , x k +\, then by the previous theorem, 
the polynomial (X — x ( ), which has degree k + 1, would divide g, which has 
degree k — an impossibility. □ 

Theorem 7.15 (Lagrange interpolation). Let F be a held, let x i, . . . , x k be dis- 
tinct elements of F, and let y\,. . . , y k be arbitrary elements of F. Then there 
exists a unique polynomial g e F\ X ] with deg(g) < k such that g{Xj) = y, for 
i = 1 , . . . , k, namely 

, = y IW*-*,) 

*' h y ' ILife-A 

Proof. For the existence paid of the theorem, one just has to verify that g(x,) = y, 
for the given g, which clearly has degree less than k. This is easy to see: for 
i = 1 ,... ,k, evaluating the zth term in the sum defining g at x, yields y h while 
evaluating any other term at x, yields 0. The uniqueness paid of the theorem follows 
almost immediately from Theorem 7.14: if g and h are polynomials of degree less 
than k such that g(x,) = y t = /i(x,-) for i = 1 ,... ,k, then g — h is a polynomial 
of degree less than k with k distinct roots, which, by the previous theorem, is 
impossible. □ 

Again, we can slightly weaken the hypothesis of this theorem: we do not need 
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to assume that the coefficient ring is a field; rather, all we really need is that for all 
i ± j, the difference x,- - xj is a unit. 


Exercise 7.16 . Let D be an infinite integral domain, and let g,h e D\X\. Show 
that if g(x) = h(x) for all x e D. then g = h. Thus, for an infinite integral 
domain D , there is a one-to-one correspondence between polynomials over D and 
polynomial functions on D. 

Exercise 7.17 . Let F be a field. 

(a) Show that for all b e F, we have b 2 = 1 if and only if b = ±1. 

(b) Show that for all a,beF , we have a 2 = b 2 if and only if a = ±b. 

(c) Show that the familiar quadratic formula holds for F, assuming F has 
characteristic other than 2, so that 2p ^ Of . That is, for all a,b,ceF with 
a ^ 0, the polynomial g := aX 2 + bX + c € .F[X] has a root in F if and 
only if there exists e e F such that e 2 = d, where d is the discriminant of 
g, defined as d := b 2 — 4 ac, and in this case the roots of g are (- b ± e) /2a. 

Exercise 7.18. Let R be a ring, let g e J?[X], with deg(g) = k > 0, and let x be 
an element of R. Show that: 

(a) there exist an integer m, with 0 < m < k, and a polynomial q e i?[X], such 
that 

g = (X - x) m q and q(x) ■£ 0, 

and moreover, the values of m and q arc uniquely determined; 

(b) if we evaluate g at X + x, we have 

k 

g( X + x ) = J] biX\ 

7=0 

where bo = ■■■ = b m -\ = 0 and b m = q(x) ■£ 0. 

Let m x (g ) denote the value m in the previous exercise; for completeness, one 
can define m x {g) := oo if g is the zero polynomial. If m x (g ) > 0, then x is called a 
root of g of multiplicity m x {g)\ if m x {g) = 1, then x is called a simple root of g, 
and if m x (g) > 1, then x is called a multiple root of g. 

The following exercise refines Theorem 7.14, taking into account multiplicities. 

Exercise 7.19. Let D be an integral domain, and suppose that g e D[X], with 
deg(g) = k > 0. Show that 

^ m x (g) < k. 

xgD 
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Exercise 7.20. Let D be an integral domain, let g, h e D[A], and let x e D. 
Show that m x {gh ) = m x (g ) + m x {h). 


7.2.4 Multi-variate polynomials 

One can naturally generalize the notion of a polynomial in a single variable to that 
of a polynomial in several variables. 

Consider the ring R[X] of polynomials over a ring R. If 7 is another indeter- 
minate, we can form the ring R[A][7] of polynomials in Y whose coefficients arc 
themselves polynomials in X over the ring R. One may write R[X, 7] instead of 
R[X][7]. An element of R[X. 7] is called a bivariate polynomial. 

Consider a typical element g e R\X. 7], which may be written 

£ k 

S='Z('Z‘'-jX i )Y J - (7.4) 

7=0 /=0 

Rearranging terms, this may also be written as 

g = £ atjX'Y 1 , (7.5) 

0 <i<k 
0<7 <t 

or as 

k i 

g='L{'£ a U Yj ) xJ - (7-6) 

i=0 7=0 

If g is written as in (7.5), the terms X' Y J arc called monomials. The total degree 
of such a monomial X 1 Y J is defined to be i + j, and if g is non-zero, then the total 
degree of g, denoted Deg(g), is defined to be the maximum total degree among all 
monomials X 1 Y J appealing in (7.5) with a non-zero coefficient a ir We define the 
total degree of the zero polynomial to be — oo. 

When g is written as in (7.6), one sees that we can naturally view g as an element 
of R[7][A], that is, as a polynomial in X whose coefficients are polynomials in 7. 
From a strict, syntactic point of view, the rings R[7][X] and R[A][7] arc not the 
same, but there is no harm done in blurring this distinction when convenient. We 
denote by deg x (g) the degree of g, viewed as a polynomial in X. and by degy(g) 
the degree of g, viewed as a polynomial in 7. 

Example 7.32. Let us illustrate, with a particular - example, the three different 
forms — as in (7.4), (7.5), and (7.6) — of expressing a bivariate polynomial. In 
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the ring Z[X, 7] we have 

g = (5X 2 — 3X + 4) Y + (2X 2 + 1) 

= 5 X 2 Y + 2X 2 - 3 XY + 47+1 
= (57 + 2)X 2 + (-37)X + (47 + 1). 

We have Deg(g) = 3, deg x (g) = 2, and deg y (g) = 1. □ 

More generally, we can form the ring R[X i X n ] of multi- variate polynomi- 
als over R in the variables X\ X n . Formally, we can define this ring recursively 

as R[X i, X„_j 1 1 X n |, that is, the ring of polynomials in the variable X„, with 

coefficients in R\X\, X„_i], A monomial is a term of the form X‘ • • • X e n ", 

and the total degree of such a monomial is e\ + • ■ • + e n . Every non-zero multi- 
variate polynomial g can be expressed uniquely (up to a re-ordering of terms) as 
+ • • • + akUk, where each u, is a non-zero element of R , and each m is a 
monomial; we define the total degree of g, denoted Deg(g), to be the maximum of 
the total degrees of the q, ’s. As usual, the zero polynomial is defined to have total 
degree — oo. 

Just as for bivariate polynomials, the order of the indeterminates is not important, 
and for every i = 1 one can naturally view any g e jR[Xi, . . . , X n ] as a 

polynomial in A, over the ring R[X i, . . . , X,_ \,X i+ \, X„], and define deg x (g) 

to be the degree of g when viewed in this way. 

Just as polynomials in a single variable define polynomial functions, so do 

polynomials in several variables. If R is a subring of E, g e R[X i X„\, 

and ai,...,a n e E, we define g(ai, . . . , a n ) to be the element of E obtained by 
evaluating the expression obtained by substituting a, for A, in g. Theorem 7.11 
carries over directly to the multi-variate case. 

Exercise 7.21 . Let R be a ring, and consider the ring of multi- variate polyno- 
mials R\X\ A,,]. For m > 0, define H m to be the subset of polynomials that 

can be expressed as a\m -\ 1- where each a, belongs to R and each q, is a 

monomial of total degree m (by definition, //„, includes the zero polynomial, and 
//o = R). Polynomials that belong to //,„ for some m are called homogeneous 
polynomials. Show that: 

(a) if g,h e H m , then g + /ie H m \ 

(b) if g e Hf and h e H m , then gh e Hf +m \ 

(c) every non-zero polynomial g can be expressed uniquely as go + • • • + g f /, 

where g, e //, for i = 0, d, g d ^ 0. and d = Deg(g); 

(d) for all polynomials g, h , we have Deg (gh) < Deg(g) + Deg (h), and if R is 
an integral domain, then Deg(g/t) = Deg(g) + Dcg(/?). 
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Exercise 7.22. Suppose that D is an integral domain, and g.h are non-zero, 
multi-variate polynomials over D such that gh is homogeneous. Show that g and h 
are also homogeneous. 

Exercise 7.23 . Let R be a ring, and let xi, . . . , x n be elements of R. Show that 
every polynomial g e R[X i, X n ] can be expressed as 

g = (X i- xi)< 7 i + • • • + (X„ - x n )q n + g(x i,...,x„), 

where q i, . . . , q n e R[X U . . . , X n \. 

Exercise 7.24. This exercise generalizes Theorem 7.14. Let D be an integral 
domain, and let g e D[Xi, . . . , X„], with Deg(g) = k > 0. Let S' be a finite, non- 
empty subset of D. Show that the number of elements (xi, . . . , x„) e S xn such that 
g(xi, . . . ,x„) = 0 is at most /els'!" -1 . 


7.3 Ideals and quotient rings 

Definition 7.16. Let R be a ring. An ideal of R is an additive subgroup I of R 
such that ar e / for all a e I and r e R (i.e., I is closed under multiplication by 
elements of R). 

Expanding the above definition, we see that a non-empty subset I of R is an 
ideal of R if and only if for all a.b e I and r e R. we have 

a + b e I, —ael, and ar e I. 

Since R is commutative, the condition ar e I is equivalent to ra e I. The condi- 
tion —ael is redundant, as it is implied by the condition ar e I with r := -1r. 
In the case when R is the ring Z, this definition of an ideal is consistent with that 
given in §1.2. 

Clearly, [Or } and R are ideals of R. From the fact that an ideal I is closed under 
multiplication by elements of R , it is easy to see that I = R if and only if 1 r e I. 

Example 7.33. For each m e Z, the set mZ is not only an additive subgroup of the 
ring Z, it is also an ideal of this ring. □ 

Example 7.34. For each m e Z, the set mZ„ is not only an additive subgroup of 
the ring Z„, it is also an ideal of this ring. □ 

Example 7.35. In the previous two examples, we saw that for some rings, the 
notion of an additive subgroup coincides with that of an ideal. Of course, that is 
the exception, not the rule. Consider the ring of polynomials R[X]. Suppose g is a 
non-zero polynomial in R[X], The additive subgroup generated by g contains only 
polynomials whose degrees are at most that of g. However, this subgroup is not an 
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ideal, since every ideal containing g must also contain g ■ X' for all i > 0, and must 
therefore contain polynomials of arbitrarily high degree. □ 

Example 7.36. Let R be a ring and x e R. Consider the set 

I:= {geR[X]:g(x)=0}. 

It is not hai'd to see that I is an ideal of J?[X]. Indeed, for all g,h e I and q e R[X], 
we have 

(g + h)(x) = g(x) + h(x) = 0 + 0 = 0 and (gq)(x) = g(x)q(x ) = 0 • q(x) = 0. 

Moreover, by Theorem 7.12, we have I = {(X — x)q : q e i?[X] }. □ 

We next develop some general constructions of ideals. 

Theorem 7.17. Let R be a ring and let a e R. Then aR := {ar : r e R} is an 
ideal of R. 

Proof. This is an easy calculation. For all ar, ar' e aR and r" e R. we have 
ar + ar ' = a(r + r') e aR and ( ar)r " = a{rr ") e aR. □ 

The ideal aR in the previous theorem is called the ideal of R generated by a. 
An ideal of this form is called a principal ideal. Since R is commutative, one 
could also write this ideal as Ra := {ra : r e R} . This ideal is the smallest ideal 
of R containing a: that is, a R contains a, and every ideal of R that contains a must 
contain everything in aR. 

Corresponding to Theorems 6.1 1 and 6.12, we have: 

Theorem 7.18. If 1 1 and h are ideals of a ring R, then so are I \ + L and I\ n 12- 

Proof We already know that I\ + h_ and I\ n h are additive subgroups of R, so 
it suffices to show that they are closed under multiplication by elements of R. The 
reader may easily verify that this is the case. □ 

Let a \, . . . , ag be elements of a ring R. The ideal a\R + • • • + a^R is called the 
ideal of R generated by ai,...,ag. When the ring R is clear from context, one 
often writes (a \, . . . , a^) to denote this ideal. This ideal is that smallest ideal of R 
containing ai, . . . , a*. 

Example 7.37. Let n be a positive integer, and let x be any integer. Define 
I := {g e Z[X ] : g(x) = 0 (mod «)}. We claim that I is the ideal (X - x.n) 
of Z[X], To see this, consider any fixed g e Z[X\. Using Theorem 7.12, we have 
g = (X — x)q + g(x) for some q e Z[X], Using the division with remainder 

property for integers, we have g(x) = nq' + r for some re {0 n — 1} and 

q' e Z. Thus, g(x) = r (mod n ), and if g(x) = 0 (mod n), then we must have 
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r = 0, and hence g = (X — x)q + nq' e (X — x, n). Conversely, if g e (X — x, n ), 
we can write g = (X — x)q + nq' for some q, q' e Z[X\, and from this, it is clear 
that g(x) = nq'(x) = 0 (mod n). □ 

Let I be an ideal of a ring R. Since I is an additive subgroup of R. we may adopt 
the congruence notation in §6.3, writing a = ft (mod I) to mean a — ft e I, and we 
can form the additive quotient group R/I of cosets. Recall that for a e R. the coset 
of I containing a is denoted [a]/, and that [a]i = a + I = [a + x : x e 1} . Also 
recall that addition in R/I was defined in terms of addition of coset representatives; 
that is, for a.bel, we defined 

[a]i + [ft]/ := [a + ft]/. 

Theorem 6.16 ensured that this definition was unambiguous. 

Our goal now is to make R/I into a ring by similarly defining multiplication in 
R/I in terms of multiplication of coset representatives. To do this, we need the 
following multiplicative analog of Theorem 6.16, which exploits in an essential 
way the fact that an ideal is closed under multiplication by elements of R; in fact, 
this is one of the main motivations for defining the notion of an ideal as we did. 

Theorem 7.19. Suppose I is an ideal of a ring R. For all a, a ' , ft, ft' e R. if 
a = a' (mod I) and ft = ft' (mod I), then ab = a'b' (mod I). 

Proof. If a = a' + x for some x e I and ft = ft' + y for some y e I, then 
ab = a'b’ + a'y + b'x + xy. Since I is closed under multiplication by elements of R , 
we see that a'y, b'x, xy e I, and since I is closed under addition, a'y+b'x+xy e I. 
Hence, ab - a'b ’ e I. □ 

Using this theorem we can now unambiguously define multiplication on R/I as 
follows: for a, ft e R, 

[ah • [ft]/ := [aft]/. 

Once that is done, it is straightforward to verify that all the properties that make 
R a ring arc inherited by R/I — we leave the details of this to the reader. The 
multiplicative identity of R/I is the coset [1«]/- 

The ring R/I is called the quotient ring or residue class ring of R modulo I. 
Elements of R/I may be called residue classes. 

Note that if I = dR. then a = ft (mod I) if and only if d \ (a — ft), and as a matter 
of notation, one may simply write this congruence as a = ft (mod d). We may also 
write [a\d instead of [a]/. 

Finally, note that if I = R, then R/I is the trivial ring. 

Example 7.38. For each n > 1, the ring Z„ is precisely the quotient ring Z/nZ. □ 
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Example 7.39. Let / be a polynomial over a ring R with deg(/) — i > 0 and 
lc(/) e R*, and consider the quotient ring E := R[X]/ f R[X], By the division 
with remainder property for polynomials (Theorem 7.10), for every g e 7?[X], 
there exists a unique polynomial h e i?[X] such that g = h (mod /) and deg (ft) < t. 
From this, it follows that every element of E can be written uniquely as [ft] /, where 
ft e is a polynomial of degree less than l. Note that in this situation, we will 
generally prefer the more compact notation R[X]/{f), instead of R\X\/f 7i[X ]. □ 

Example 7.40. Consider the polynomial f := X 2 + X + 1 e Z 2 [X ] and the quotient 
ring E := Z 2 [X]/(/). Let us name the elements of E as follows: 

00 := [0]/, 01 := [1]/, 10 := [X]/, 11 := [X + l] f . 

With this naming convention, addition of two elements in E corresponds to just 
computing the bit-wise exclusive-or of their names. More precisely, the addition 
table for E is the following: 


+ 

00 

01 

10 

11 

00 

00 

01 

10 

11 

01 

01 

00 

11 

10 

10 

10 

11 

00 

01 

11 

11 

10 

01 

00 


Note that 00 acts as the additive identity for E, and that as an additive group, E is 
isomorphic to the additive group Z 2 x Z 2 . 

As for multiplication in E, one has to compute the product of two polynomials, 
and then reduce modulo /. For example, to compute 10 • 11, using the identity 
X 2 = X + 1 (mod /), one sees that 

X • (X + 1) = A 2 + X = (X + 1) + X = 1 (mod /); 
thus, 10 • 11 = 01. The reader may verify the following multiplication table for E: 



00 

01 

10 

11 

00 

00 

00 

00 

00 

01 

00 

01 

10 

11 

10 

00 

10 

11 

01 

11 

00 

11 

01 

10 


Observe that 01 acts as the multiplicative identity for E. Notice that every non-zero 
element of E has a multiplicative inverse, and so E is in fact a field. Observe that 
E* is cyclic: the reader may verify that both 10 and 11 have multiplicative order 3. 

This is the first example we have seen of a finite field whose cardinality is not 
prime. □ 
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Exercise 1 .25 . Show that if F is a field, then the only ideals of F arc { 0 r j and 
F. 

Exercise 7.26 . Let a, b be elements of a ring R. Show that 
a | b <=> b 6 aR <=> bR C aR. 

Exercise 7.27 . Let R be a ring. Show that if I is a non-empty subset of i?[X] 
that is closed under addition, multiplication by elements of R , and multiplication 
by X , then I is an ideal of i?[X], 

Exercise 7,28 , Let I be an ideal of R , and S a subring of R. Show that / n S is 
an ideal of S. 

Exercise 7,29 , Let I be an ideal of R, and S a subring of R. Show that I + S is 
a subring of R , and that I is an ideal of 1 + S. 

Exercise 7,30 , Let I \ be an ideal of R \ , and Ii an ideal of Ri. Show that 7) x I 2 
is an ideal of R\ x R 2 . 

Exercise 7.31. Write down the multiplication table for Z 2 [X]/(X 2 + X). Is this 
a field? 

Exercise 7.32. Let I be an ideal of a ring R , and let x and y be elements of R 
with x = y (mod I). Let g e i?[X], Show that g(x) = g(y) (mod I). 

Exercise 7.33. Let R be a ring, and fix xi, . . . , x„ e R. Let 

I := [g e 7?[Xi,...,X„] : g(xi,...,x„) = 0}. 

Show that I is an ideal of -R[Xi, . . . , X n ], and that I = (X \ — x\, . . . ,X n — x n ). 

Exercise 7.34. Let p be a prime, and consider the ring Q <p> (see Example 7.26). 
Show that every non-zero ideal of Q lp) is of the form ( /;'), for some uniquely deter- 
mined integer i > 0. 

Exercise 7.35. Let p be a prime. Show that in the ring Z[X], the ideal (X,p) is 
not a principal ideal. 

Exercise 7.36. Let F be a field. Show that in the ring F[X, Y], the ideal (X, Y ) 
is not a principal ideal. 

Exercise 7.37. Let R be a ring, and let {/;}" 0 be a sequence of ideals of R such 
that /, C I i+ \ for all i = 0, 1,2 Show that the union (J“ 0 A i s a l so an ideal of 

R. 

Exercise 7.38 . Let R be a ring. An ideal / of R is called prime if I C R and if 
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for all a, b e R, ab e I implies a e I or b e I. An ideal / of R is called maximal 
if/ C R and there are no ideals ./ of R such that / C J C R. Show that: 

(a) an ideal / of R is prime if and only if R/I is an integral domain; 

(b) an ideal / of R is maximal if and only if R/I is a field; 

(c) all maximal ideals of R are also prime ideals. 

Exercise 7.39. This exercise explores some examples of prime and maximal 
ideals. Show that: 

(a) in the ring Z, the ideal {0} is prime but not maximal, and that the maximal 
ideals arc precisely those of the form pL, where p is prime; 

(b) in an integral domain /), the ideal {0} is prime, and this ideal is maximal if 
and only if D is a field; 

(c) if p is a prime, then in the ring Z[X\, the ideal (X, p) is maximal, while the 
ideals (X) and (p) arc prime, but not maximal; 

(d) if F is a field, then in the ring F\X, 7], the ideal (X, 7 ) is maximal, while 
the ideals (X) and (7) arc prime, but not maximal. 

Exercise 7.40. It is a fact that every non-trivial ring R contain at least one max- 
imal ideal. Showing this in general requires some fancy set-theoretic notions. This 
exercise develops a simple proof in the case where R is countable (see §A3). 

(a) Show that if R is non-trivial but finite, then it contains a maximal ideal. 

(b) Assume that R is countably infinite, and let a\, ai, a^,. . . be an enumeration 
of the elements of R. Define a sequence of ideals Jo, I\, I 2 , . . . , as follows. 
Set Jo := {Or}, and for each i > 0, define 

^ f I, + cijR if I, + cijR C R-, 

l+ \ li otherwise. 

Finally, set I := (J“ 0 which by Exercise 7.37 is an ideal of R. Show 
that I is a maximal ideal of R. Hint: first, show that / C R by assuming 
that 1 r £ / and deriving a contradiction; then, show that I is maximal 

by assuming that for some i = 1,2 we have I C / + a t R C R, and 

deriving a contradiction. 

Exercise 7.41. Let R be a ring, and let / and J be ideals of R. With the ring- 
theoretic product as defined in Exercise 7.2, show that: 

(a) //is an ideal; 

(b) if I and / are principal ideals, with I = a R and J = hR, then IJ = abR , 
and so is also a principal ideal; 

(c) // C / n /; 
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(d) if 1 + J = R , then IJ = In J. 

Exercise 7.42. Let jR be a subring of E, and I an ideal of R. Show that the 
ring-theoretic product IE is an ideal of E that contains I, and is the smallest such 
ideal. 

Exercise 7.43. Let M be a maximal ideal of a ring R , and let a,b e R. Show 
that if ab e M 2 and b £ M , then a e M 2 . Here, M 2 := MM, the ring-theoretic 
product. 

Exercise 7.44. Let F be a field, let / e F[X, 7], and let E := F[X, 7]/(/). 
Define V(f) := {(x,y) e F x F : f(x,y ) = 0}. 

(a) Every element a of E naturally defines a function from V (/) to F, as fol- 
lows: if a = [g]/, with g e F[X, 7], then for P = (x, y) e V(f), we 
define a(P) := g(x, y). Show that this definition is unambiguous, that is, 
g = h (mod /) implies g(x, y) = h{x, y). 

(b) For P = (x, y) e V(f), define Mp := {a e E : a(P ) = 0}. Show that Mp 
is a maximal ideal of E, and that Mp = ftE + vE, where ft := [X — x\f 
and v := [7 — y] /. 

Exercise 7.45. Continuing with the previous exercise, now assume that the char- 
acteristic of F is not 2, and that / = Y 2 — <p. where 4> £ F\X\ is a non-zero 
polynomial with no multiple roots in F (see definitions after Exercise 7.18). 

(a) Show that if P = (x,y) e V(f), then so is P := (x, -y), and that 
P = P <=> y = 0 <=> 4>(x) = 0. 

(b) Let P = (x,y) e V(f ) and ^ := [X — x]f e E. Show that /,;£ = MpMp 
(the ring-theoretic product). Hint: use Exercise 7.43, and treat the cases 
P = P and P P separ ately. 

Exercise 7.46. Let R be a ring, and I an ideal of R. Define Rad(7) to be the set 
of all a e R such that a n e I for some positive integer n. 

(a) Show that Rad(/) is an ideal of R containing I. Hint: show that if a n e I 
and b m e I, then ( a + b) n+m e I. 

(b) Show that if R = Z and I = (d), where d = p e } ' ■ ■ ■ p c ’. r is the prime factor- 
ization of d, then Rad(/) = {p\ ■ ■ ■ p r ). 
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7.4 Ring homomorphisms and isomorphisms 
Definition 7.20. A function p from a ring R to a ring R' is called a ring homo- 
morphism if 

(i) p is a group homomorphism with respect to the underlying additive groups 
of R and R 1 , 

(ii) p(ab ) = p(a)p(b) for all a.beR, and 

(iii) p(l R ) = 1 r’. 

Expanding the definition, the requirements that p must satisfy in order to be a 
ring homomorphism are that for all a, b e R. we have p(a + b) = p(a) + p{b) and 
p{ab) = p(a)p(b), and that p(l«) = 1«'. 

Note that some texts do not require that a ring homomorphism satisfies paid (iii) 
of our definition (which is not redundant — see Examples 7.49 and 7.50 below). 
Since a ring homomorphism is also an additive group homomorphism, we use the 
same notation and terminology for image and kernel. 

Example 7.41. If S is a subring of a ring R. then the inclusion map i : S -> R is 
obviously a ring homomorphism. □ 

Example 7.42. Suppose I is an ideal of a ring R. Analogous to Example 6.36, we 
may define the natural map from the ring R to the quotient ring R/I as follows: 

p: R -> R/I 
a \-> [a]/. 

Not only is this a surjective homomorphism of additive groups, with kernel I, it is 
a ring homomorphism. Indeed, we have 

p{ab) = [a*]/ = [a]/ • [ft]/ = p(a) ■ p(b), 

and p(1r) = [1r]/, which is the multiplicative identity in R/I. □ 

Example 7.43. For a given positive integer n. the natural map from Z to Z„ sends 
a e Z to the residue class \a\„. This is a surjective ring homomorphism, whose 
kernel is «Z. □ 

Example 7.44. Let R be a subring of a ring E, and lix a e E. The polynomial 
evaluation map 

p : R[X] -> E 

g g(a ) 

is a ring homomorphism (see Theorem 7.1 1). The image of p consists of all poly- 
nomial expressions in a with coefficients in R. and is denoted R[a]. As the reader 
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may verify, i?[a] is a subring of E containing a and all of R. and is the smallest 
such subring of E. □ 

Example 7.45. We can generalize the previous example to multi-variate polyno- 
mials. If R is a subring of a ring E and a \, . . . , a n g E, then the map 

P : R[X u ...,Xn]^ E 

g g(ai,...,a„) 

is a ring homomorphism. Its image consists of all polynomial expressions in 

a i a n with coefficients in R , and is denoted R[ai , . . . , a„\. Moreover, this 

image is a subring of E containing a\ ,... ,a n and all of R, and is the smallest such 
subring of E. Note that R[a \, . . . , a n ] = i?[ai, . . . , □ 


Example 7.46. Let p : R —> R' be a ring homomorphism. We can extend the 
domain of definition of p from R to i?[X] by defining p(X, a;X') : = 2; P( a i)^'- 
This yields a ring homomorphism from I?[X] into i?'[X], To verify this, suppose 
g = 2; ajX‘ and h = 2, bjX 1 are polynomials in R[X]. Let s := g + h g i?[X] and 
p '.= gh e i?[X], and write s = s,-X' and p = p,X ! , so that 

Sj = a, + bj and p, = ^ ajbk- 

i—j+k 


Then we have 


p(sd = p(ai + bi) = p{at) + p(bj). 


which is the coefficient of X' in p{g) + p(h), and 

p{Pi) = p[ ^ ajbk) = ^ Pi a jbk) = ^ p(aj)p{b k ), 
i = j+k i = j+k i = j+k 

which is the coefficient of X' in p(g)p(h). 

Sometimes a more compact notation is convenient: we may prefer to write a for 
the image of a e R under p, and if we do this, then for g = ^ ( . a ; X' e i?[X], we 
write g for the image JL d,X' of g under the extension of p to i?[X]. □ 


Example 7.47. Consider the natural map that sends aeZ to d := \a\ n g 7L n (see 
Example 7.43). As in the previous example, we may extend this to a ring homomor- 
phism from Z[X | to Z„[X] that sends g = n,X' g Z[X | to g = ^ Jj ajX' g Z„[X], 
This homomorphism is clearly surjective. Let us determine its kernel. Observe that 
if g = JL a,X‘ . then g = 0 if and only if n \ a, for each /; therefore, the kernel is 
the ideal «Z[X] of Z[X], □ 
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Example 7.48. Let R be a ring of prime characteristic p. For all a,beR, we have 
(see Exercise 7.1) 

(< a + b) p = Yj ( P \a p ~ k b k . 
k=o ' ' 

However, by Exercise 1.14, all of the binomial coefficients are multiples of p, 
except for k = 0 and k = p, and hence in the ring R , all of these terms vanish, 
leaving us with 

(a + b) p = a p + b p . 

This result is often jokingly referred to as the “freshman’s dream,” for somewhat 
obvious reasons. 

Of course, as always, we have 

( ab) p = a p b p and 1^ = 1r, 

and so it follows that the map that sends a e R to a p e R is a ring homomorphism 
from R into R. □ 

Example 7.49. Suppose R is a non- trivial ring, and let p : R -> R map everything 
in R to Or. Then p satisfies parts (i) and (ii) of Definition 7.20, but not paid (iii). □ 

Example 7.50. In special situations, paid (iii) of Definition 7.20 may be redundant. 
One such situation arises when p : R -» R' is surjective. In this case, we know that 
1 r- = p(a) for some a e R. and by paid (ii) of the definition, we have 

pO-r) = pO-r) ■ 1 R' = p(Ir)p(o) = p( l R ■ a ) = p(a) = l R >. □ 

For a ring homomorphism p : R -> R'. all of the results of Theorem 6.19 apply. 
In particular, p( O r) = 0 r', p(a ) = p(b) if and only if a = b (mod Ker p), and p is 
injective if and only if Ker p = {Or }. However, we may strengthen Theorem 6.19 
as follows: 

Theorem 7.21. Let p : R — ► R' be a ring homomorphism. 

(i) If S is a subring of R. then p(S) is a subring of R'; in particular (setting 
S := R), Im p is a subring of R'. 

(ii) If S' is a subring of R'. then p~ l (S') is a subring of R. 

(ii) If I is an ideal of R, then p(I) is an ideal of Im p. 

(iv) If I' is an ideal of Im p, then p~ l (I') is an ideal of R: in particular (setting 
I' := {Or- } ), Ker p is an ideal of R. 

Proof. In each paid, we already know that the relevant object is an additive sub- 
group, and so it suffices to show that the appropriate additional properties arc sat- 
isfied. 
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(i) For all a,beS, we have ab e S, and hence p(S) contains p(ab) = p(a)p(b). 
Also, 1 r e S , and hence p(S ) contains p(Ir) = 1 r'. 

(ii) If p(a ) e S' and p(b) e S', then p(ab ) = p(a)p(b) e S'. Moreover, 
p(1r) = 1 R> e A'. 

(iii) For all a e I and r e R, we have nr e /, and hence p(7) contains 
p(nr) = p(a)p(r). 

(iv) For all a e p~ l (I') and r e R, we have p(ar) = p(a)p(r), and since p(a) 
belongs to the ideal I', so does p(a)p(r), and hence p~ l (I') contains ar. □ 

Theorems 6.20 and 6.21 have natural ring analogs — one only has to show that 
the corresponding group homomorphisms satisfy the additional requirements of a 
ring homomorphism, which we leave to the reader to verify: 

Theorem 7.22. If p : R -» R' and p' : R' -> R" are ring homomorphisms, then 
so is their composition p' o p : R -» R". 

Theorem 7.23. Let p t : R -» i?', for i = 1, . . . , k, he ring homomorphisms. Then 
the map 

p : R -> x • • • x R' k 
a (pi(a), . . . , p k (a)) 

is a ring homomorphism. 

If a ring homomorphism p : R -» R' is a bijection, then it is called a ring 
isomorphism of R with R' . If such a ring isomorphism p exists, we say that R is 
isomorphic to R', and write R = R’. Moreover, if R = R'. then p is called a ring 
automorphism on R. 

Analogous to Theorem 6.22, we have: 

Theorem 7.24. If p is a ring isomorphism of R with R', then the inverse function 
p~ l is a ring isomorphism of R' with R. 

Proof. Exercise. □ 

Because of this theorem, if R is isomorphic to R', we may simply say that “i? 
and R' are isomorphic.” We stress that a ring isomorphism is essentially just a 
“renaming” of elements; in particular, we have: 

Theorem 7.25. Let p : R -» R' be a ring isomorphism. 

(i) For all a e R, a is a zero divisor if and only if p(a ) is a zero divisor. 

(ii) For all a e R, a is a unit if and only if p(a) is a unit. 

(iii) The restriction of R to R* is a group isomorphism of R* with (R')*. 
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Proof. Exercise. □ 

An injective ring homomorphism p : R — »• E is called an embedding of R in E. 
In this case, Im p is a subring of E and R = Im p. If the embedding is a natural one 
that is clear from context, we may simply identify elements of R with their images 
in E under the embedding; that is, for a e R. we may simply write “a,” and it is 
understood that this really means “p(^)” if the context demands an element of E. 
As a slight abuse of terminology, we shall say that R is a subring of E. Indeed, 
by appropriately renaming elements, we can always make R a subring of E in the 
literal sense of the term. 

This practice of identifying elements of a ring with their images in another ring 
under a natural embedding is very common. We have already seen an example of 
this, namely, when we formally defined the ring of polynomials R[X] over R in 
§7.2.1, we defined the map eo : R -> R[X] that sends c e R to the polynomial 
whose constant term is c, with all other coefficients zero. This map eo is an embed- 
ding, and it was via this embedding that we identified elements of R with elements 
of R[X], and so viewed R as a subring of i?[X], We shall see more examples of 
this later (in particular, Example 7.55 below). 

Theorems 6.23 and 6.24 also have natural ring analogs — again, one only has to 
show that the corresponding group homomorphisms are also ring homomorphisms: 


Theorem 7.26 (First isomorphism theorem). Let p : R -> R' be a ring homo- 
morphism with kernel K and image S'. Then we have a ring isomorphism 

R/K = S'. 


Specifically, the map 

p : R/K -> R' 

[a] K p(a) 

is an injective ring homomorphism whose image is S'. 


Theorem 7.27. Let p : R -> R' be a ring homomorphism. Then for every ideal I 
of R with I C Ker p, we may define a ring homomorphism 

p : R/I -> R’ 

[a]j >-+ p(d). 

Moreover, Im p = Im p, and p is injective if and only if I = Ker p. 


Example 7.51. Returning again to the Chinese remainder theorem and the discus- 
sion in Example 6.48, if is a pairwise relatively prime family of positive 
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integers, and n := f^ =| then the map 

p . Z ^ X • • • X Z^ 

a ([a\ nx , . . . ,[a\ nk ) 

is not just a surjective group homomorphism with kernel nZ, it is also a ring homo- 
morphism. Applying Theorem 7.26, we get a ring isomorphism 

p ■ Z n > Z rtl x • • • x Z nk 

[a] n i r ([a]„ 1 ,...,[n]„ t ), 

which is the same function as the function 6 in Theorem 2.8. By part (iii) of 
Theorem 7.25, the restriction of 9 to Z* is a group isomorphism of Z* with the 
multiplicative group of units of Z rt| x • • • x Z„ ( . , which (according to Example 7.15) 
is Z* x • • • x Z* r Thus, part (iii) of Theorem 2.8 is an immediate consequence of 
the above observations. □ 

Example 7.52. Extending Example 6.49, if n\ and ni are positive integers with 
«i | ni. then the map 

P ■ — y 

[a]„ 2 [a] m 

is a surjective ring homomorphism. □ 

Example 7.53. For a ring R , consider the map p : Z -» R that sends me Z 
to m ■ 1r in R. It is easily verified that p is a ring homomorphism. Since Ker p 
is an ideal of Z, it is either {0} or of the form nZ for some n > 0. In the first 
case, if Ker p = {0}, then Im p = Z, and so the ring Z is embedded in R. and R 
has characteristic zero. In the second case, if Ker p = nZ for some n > 0, then 
by Theorem 7.26, Im p = Z„, and so the ring Z„ is embedded in R. and R has 
characteristic n. 

Note that Imp is the smallest subring of R: any subring of R must contain 1 r 
and be closed under addition and subtraction, and so must contain Im p. □ 

Example 7.54. We can generalize Example 7.44 by evaluating polynomials at sev- 
eral points. This is most fruitful when the underlying coefficient ring is a field, and 
the evaluation points belong to the same field. So let F be a field, and let x \, . . . , Xk 
be distinct elements of F . Define the map 

p : F\X\ F xk 

g i r (g(x 1 ),...,g(x fc )). 

This is a ring homomorphism (as seen by applying Theorem 7.23 to the polynomial 
evaluation maps at the points x\, . . . , x/J. By Theorem 7.13, Ker p = (/), where 
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f := U*_i(X — Xj). By Theorem 7.15, p is surjective. Therefore, by Theorem 7.26, 
we get a ring isomorphism 

p: F[X]/(f)^F xk 

[g]/ (g<Al g(x k )). □ 

Example 7.55. As in Example 7.39, let / be a polynomial over a ring R with 
deg(/) = l and lc(/) e R*, but now assume that l > 0. Consider the natural 
map p from R[X] to the quotient ring E := R[X]/(f) that sends g e i?[X] to 
[g] /. Let t be the restriction of p to the subring R of R[X], Evidently, t is a ring 
homomorphism from R into E. Moreover, since distinct polynomials of degree 
less than t belong to distinct residue classes modulo /, we see that r is injective. 
Thus, t is an embedding of R into E. As r is a very natural embedding, we can 
identify elements of R with their images in E under r, and regal'd R as a subring 
of E. Taking this point of view, we see that if g = a, X’, then 

[gi / = [2 «/*'] = 2>*- ] / (m /)' = E a ‘? = 

i f i i 

where c := [X] / e E. Therefore, the natural map p may be viewed as the polyno- 
mial evaluation map (see Example 7.44) that sends g e i?[X] to g(|) e E. 

Note that we have E = moreover, every element of E can be expressed 

uniquely as g(|) for some g e R\X ] of degree less than £, and more generally, for 
arbitrary g, h e R[X], we have g(c) = h(%) if and only if g = h (mod /). Finally, 
note that / (<*) = [/]/ = [0] /; that is, | is a root of /. □ 

Example 7.56. As a special case of Example 7.55, let / := X 2 + 1 e M[X], 
and consider the quotient ring M[X]/(/). If we set i := [X]/ e M[X]/(/), then 
every element of M[X]/(/) can be expressed uniquely as a + bi, where a. b e R. 
Moreover, we have i 2 = -1, and more generally, for all a , b , a', b' e M, we have 

(a + bi) + ( a ' + b'i) = (a + a') + {b + b')i 

and 

(. a + bi) ■ ( a ' + b'i ) = (aa 1 - bb ') + ( ab ' + a'b)i. 

Thus, the rules for arithmetic in M[X]/(/) arc precisely the familial' rules of com- 
plex arithmetic, and so C and M[X]/(/) are essentially the same, as rings. Indeed, 
the “algebraically correct” way of defining the field of complex numbers C is sim- 
ply to define it to be the quotient ring M[X]/(/) in the first place. This will be our 
point of view from now on. □ 
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Example 7.57. Consider the polynomial evaluation map 
p : R[X] C = R[X]/(X 2 + 1) 
g g(- 0- 

For every g e M[X], we may write g = (X 2 + 1 )q + a + hX. where q e M[X] and 
a,AeR. Since (— i) 2 + 1 = i 2 + 1 = 0, we have 

g(—i) = ((-z) 2 + 1)<7( — r) + a — bi = a - bi. 

Clearly, then, p is surjective and the kernel of p is the ideal of R[X] generated by 
the polynomial X 2 + 1. By Theorem 7.26, we therefore get a ring automorphism p 
on C that sends a + hi e C to a — bi. In fact, p is none other than the complex con- 
jugation map. Indeed, this is the “algebraically correct” way of defining complex 
conjugation in the first place. □ 

Example 7.58. We defined the ring Z[z] of Gaussian integers in Example 7.25 as 
a subring of C. Let us verify that the notation Z[i\ introduced in Example 7.25 is 
consistent with that introduced in Example 7.44. Consider the polynomial evalua- 
tion map p : Z[X\ C that sends g e Z[X\ to g(i) e C. For every g e Z[X], we 
may write g = (X 2 + 1 )q + a + bX , where q e Z[X ] and a, b e Z. Since r + 1 = 0, 
we have g(i) = (z 2 + l)q(i) + a + bi = a + bi. Clearly, then, the image of p is 
the set {a + bi : a, b e Z}, and the kernel of p is the ideal of Z[X] generated by 
the polynomial X 2 + 1. This shows that Z[z] in Example 7.25 is the same as Z[z] 
in Example 7.44, and moreover, Theorem 7.26 implies that Z[z] is isomorphic to 
Z[X]/(X 2 + 1). 

Therefore, we can directly construct the Gaussian integers as the quotient ring 
Z \X\/{X 2 + 1). Likewise the field Q[z] (see Exercise 7.14) can be constructed 
directly as Q[X]/(X - + 1). □ 

Example 7.59. Let p be a prime, and consider the quotient ring E := Z p [X]/(f), 
where / := X 2 + 1. If we set z := [X] f G E, then E = Z p [i] = {a + bi : a, b e Z p }. 
In particular, E is a ring of cardinality p 2 . Moreover, we have z' 2 = -1, and the 
rules for addition and multiplication in E look exactly the same as they do in C: 
for all a, b, a \ l> e Z p , we have 

(a + bi) + ( a ' + b'i) = (a + a') + (b + b')i 

and 

(a + bi) ■ ( a' + b'i) = (aa' — bb r ) + ( ab' + a’b)i. 

The ring E may or may not be a field. We now determine for which primes p we 
get a field. 
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If p = 2, then 0 = 1 + i 2 = (1 + i) 2 (see Example 7.48), and so in this case, 1 + i 
is a zero divisor and E is not a field. 

Now suppose p is odd. There arc two subcases to consider: p = I (mod 4) and 
p = 3 (mod 4). 

Suppose p = 1 (mod 4). By Theorem 2.31, there exists c e Z p such that 
c 2 = -1, and therefore / = X 2 + 1 = X 2 — c 2 = (X — c)(X + c), and by Exam- 
ple 7.45, we have a ring isomorphism E = h p x 7L P (which maps a + bi e E to 
(a + be, a - be) e 7L V x Z p ); in particular, E is not a field. Indeed, c + i is a zero 
divisor, since (c + i)(c - i) = c 2 — i 2 = c 2 + 1=0. 

Suppose p = 3 (mod 4). By Theorem 2.31, there is no c e 7L P such that c 2 = - 1. 
It follows that for all a, b e Z p , not both zero, we must have a 2 + b 2 0; indeed, 
suppose that a 2 + b 2 = 0, and that, say, b ^ 0; then we would have ( a/b ) 2 = — 1, 
contradicting the assumption that -1 has no square root in Z p . Therefore, a 2 + b 2 
has a multiplicative inverse in Z p , from which it follows that the formula for mul- 
tiplicative inverses in C applies equally well in E\ that is, 


(i a + bi) 1 


a - bi 
a 2 + b 2 ' 


Therefore, in this case, E is a field. □ 


In Example 7.40, we saw a finite field of cardinality 4. The previous example 
provides us with an explicit construction of a finite field of cardinality p 2 , for every 
prime p congruent to 3 modulo 4. As the next example shows, there exist finite 
fields of cardinality p 2 for all primes p. 

Example 7.60. Let p an odd prime, and let d € Z*. Let / := X 2 -d e Z P [X], 
and consider the ring E := Z p [X\/(f) = Z p [|], where := [A]/ e E. We have 
E= {a + btr.a,b e lip) and \E\ = p 2 . Note that | 2 = d, and the general rules for 
arithmetic in E look like this: for all a, b, a', b' e Z p , we have 

( a + bE) + ( a ' + b’E) = (a + a') + (b + b')£ 


and 

(n + bE) ■ ( a ' + b'E) = {ad + bb'd) + ( ab' + a'b)E 


Suppose that d e (Z*) 2 , so that d = c 2 for some c e Z*. Then / = ( X-c){X +c), 
and like in previous example, we have a ring isomorphism E = 7L P x 7L P (which 
maps a + b^ e E to (a + be, a — be) e Z p x Z p ); in particular, E is not a field. 

Suppose that d £ (Z*) 2 . This implies that for all a,be Z p , not both zero, we 
have a 2 — b 2 d E- 0. Using this, we get the following formula for multiplicative 
inverses in E : 


(a + b£T l 


a — bt; 
a 2 — b 2 d 
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Therefore, E is a field in this case. 

By Theorem 2.20, we know that |(Z*) 2 | = (p — l)/2, and hence there exists 
d eZ*\ (Z *) 2 for all odd primes p. Thus, we have a general (though not explicit) 
construction for finite fields of cardinality p 2 for all odd primes p. □ 

Exercise 7.47 . Show that if p : F — »• R is a ring homomorphism from a field F 
into a ring R , then either R is trivial or p is injective. Hint: use Exercise 7.25. 

Exercise 7.48 . Verify that the “is isomorphic to” relation on rings is an equiva- 
lence relation; that is, for all rings Hi, H 2 , H 3 , we have: 

(a) Hi “Hi; 

(b) R[ = R 2 implies R 2 — Rp, 

(c) Hi = R 2 and R 2 = H 3 implies Hi = H 3 . 

Exercise 7.49 . Let p, : Rj -> H-, for i = be ring homomorphisms. 

Show that the map 

p : Hi x • ■ • x Hfc -> Hj x • • • x R' k 

(au---,a k ) 1 r (pi(ai), ...,p k (a k )) 

is a ring homomorphism. 

Exercise 7.50 . Let p : H — »• H' be a ring homomorphism, and let a e R. Show 
that p(aR ) = p(a)p(R). 

Exercise 7.51 . Let p : R -> H' be a ring homomorphism. Let S be a subring 
of H, and let t : S -» R' be the restriction of p to S. Show that r is a ring 
homomorphism and that Ker r = Ker p n S. 

Exercise 7.52 . Suppose Hi, . . . , R k are rings. Show that for each i = 1 ,k, 
the projection map Kj : Hi x • • • x R k — <• H, that sends a k ) to n, is a 

surjective ring homomorphism. 

Exercise 7.53 . Show that if H = Hj x R 2 for rings Hi and H 2 , and 7) is an ideal 
of Hi and I 2 is an ideal of H 2 , then we have a ring isomorphism R/(I \ x / 2 ) — 
R\/I\ x H 2 // 2 . 

Exercise 7.54 . Let I be an ideal of H, and S a subring of H. As we saw in 
Exercises 7.28, and 7.29, I n S is an ideal of S, and I is an ideal of the subring 
I + S. Show that we have a ring isomorphism (7 + S)/I = S/(I n S ). 

Exercise 7.55. Let p : R — ► H' be a ring homomorphism with kernel K. Let 7 
be an ideal of H. Show that we have a ring isomorphism H/(7 + K) = p( R) / p( I). 
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Exercise 7.56. Let n be a positive integer, and consider the natural map that 
sends a e Z to a := [a]„ e Z„, which we may extend coefficient-wise to a ring 
homomorphism from Z[X\ to Z„[X], as in Example 7.47. Show that for every 
/ e Z[X], we have a ring isomorphism Z \X\/(f, n ) = Z n \X\/(f). 

Exercise 7.57. Let n be a positive integer. Show that we have ring isomorphisms 
Z[X\/(n) = Z n [X], Z[X]/(X) = Z, and Z[X\/(X, n) = Z„. 

Exercise 7.58. Let n = pq, where p and q are distinct primes. Show that we 
have a ring isomorphism Z„[X] = Z P [X] x Z q \X\. 

Exercise 7.59. Let p be a prime with p = 1 (mod 4). Show that we have a ring 
isomorphism Z [X\/(X 2 + \,p) = Z p x Z p . 

Exercise 7.60. Let p : R -> R' be a surjective ring homomorphism. Let S be 
the set of all ideals of R that contain Ker p, and let S' be the set of all ideals of 
R'. Show that the sets S and S' are in one-to-one correspondence, via the map that 
sends I e S to p{I ) e S'. Moreover, show that under this correspondence, prime 
ideals in S correspond to prime ideals in S', and maximal ideals in S correspond 
to maximal ideals in S'. (See Exercise 7.38.) 

Exercise 7.61. Let n be a positive integer whose factorization into primes is 
n = p e { 1 ■ ■ ■ p e / . What arc the prime ideals of Z„? (See Exercise 7.38.) 

Exercise 7.62. Let p : R — >• S be a ring homomorphism. Show that p{R*) C S*, 
and that the restriction of p to R* yields a group homomorphism p* : R* -» S*. 

Exercise 7.63. Let R be a ring, and let X| , . . . , x n be elements of R. Show that 
the rings R and R[X i, . . . , X n ]/(X{ — x\, X n — x n ) are isomorphic. 

Exercise 7.64. This exercise and the next generalize the Chinese remainder the- 
orem to arbitrary rings. Suppose I and J are two ideals of a ring R such that 
I + J = R. Show that the map p : R — > R/I x R/J that sends a e R to 
(|a|/, \a\j) is a surjective ring homomorphism with kernel IJ (see Exercise 7.41). 
Conclude that R/(IJ) is isomorphic to R/I x R/J . 

Exercise 7.65. Generalize the previous exercise, showing that R/{I\ ■ ■ ■ Ik) is 
isomorphic to R/I\ x • • • x R/Ik, where R is a ring, and /),..., /^ are ideals of 
R, provided I, + I j = R for all i, j such that i ^ j. 

Exercise 7.66. Let be the subring of Q defined in Example 7.26. Let us 
define the map p : Q <m> — <• Z m as follows. For a/b e Q with b relatively prime 
to m, p{a/b) := [a] m ([Z>] m ) _1 . Show that p is unambiguously defined, and is a 
surjective ring homomorphism. Also, describe the kernel of p. 



7.5 The structure of 1i] 


203 


Exercise 7.67. Let R be a ring, a e R*, and b e R. Define the map p : R[X] — »• 
i?[X] that sends g e i?[X] to g( aX + b ). Show that p is a ring automorphism. 

Exercise 7.68. Consider the subring Z[l/2] of Q. Show that Z[l/2] = {a/ 2‘ : 
a, i e Z, i > 0}, that (Z[l/2])* = { 2 l : i e Z}, and that every non-zero ideal of 
Z[1 /2] is of the form (m), for some uniquely determined, odd integer m. 


7.5 The structure of Z* 

We are now in a position to precisely characterize the structure of the group Z*, for 
an arbitrary integer n > 1. This characterization will prove to be very useful in a 
number of applications. 

Suppose n = ■ ■ ■ p e f is the factorization of n into primes. By the Chinese 

remainder theorem (see Theorem 2.8 and Example 7.51), we have the ring isomor- 
phism 

6 : Z„ — » Z e i x • • • x Z 

" Pj yr 

[a] n i r ([fl]^i,...,[a]^0, 

and restricting 6 to Z* yields a group isomorphism 

Z* £Z* x ... xZ* er . 

" Pi A- 

Thus, to determine the structure of the group Z* for general n, it suffices to deter- 
mine the structure for n = p e , where p is prime. By Theorem 2.10, we already 
know the order of the group Z* e , namely, cp(p e ) = p e ~ 1 ( p — 1), where cp is Euler’s 
phi function. 

The main result of this section is the following: 

Theorem 7.28. If p is an odd prime, then for every positive integer e, the group 
Z* c is cyclic. The group Z* e is cyclic for e = 1 or 2, but not for e > 3. For e > 3, 
Z* e is isomorphic to the additive group Z 2 x Z 2 <- 2 . 

In the case where e = 1, this theorem is a special case of the following, more 
general, theorem: 

Theorem 7.29. Let D be an integral domain and G a subgroup of D* of Unite 
order. Then G is cyclic. 

Proof. Suppose G is not cyclic. If m is the exponent of G. then by Theorem 6.41, 
we know that m < |G|. Moreover, by definition, a m = 1 for all a e G: that is, every 
element of G is a root of the polynomial X m — 1 e T>[X]. But by Theorem 7.14, a 
polynomial of degree m over an integral domain has at most m distinct roots, and 
this contradicts the fact that m <\G\. □ 
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This theorem immediately implies that Z* is cyclic for every prime p, since 7L P 
is a field; however, we cannot directly use this theorem to prove that Z* c is cyclic 
for e > 1 (and p odd), because h p <- is not a field. To deal with the case e > 1, we 
need a few simple facts. 

Lemma 7.30. Let p be a prime. For every positive integer e, if a = b (mod p e ), 
then a p = b p (mod p e+l ). 

Proof. Suppose a = b (mod p e ), so that a = b + cp e for some c e Z. Then 
a p = b p +pb p ~ l cp e + dp 2e for some d e Z, and it follows that a p = b p (mod p e+l ). □ 

Lemma 7.31. Let p be a prime, and let e be a positive integer such that p e > 2. If 
a= 1 + p e (mod p e+l ), then a p = 1 + p e+l (mod p e+ 2 ). 

Proof. Suppose a = 1 +p e (mod p e+l ). By Lemma 7.30, a p = (1 +p e ) p (mod p e+2 ). 
Expanding (1 + p e ) p , we have 

(!+/)/*= 1 + p . p ' + J j ( p \ p *+ p *P. 

k = 2 ' ' 

By Exercise 1.14, all of the terms in the sum on k are divisible by p l+2e , and 
1 + 2e > e + 2 for all e > 1. For the term p ep , the assumption that p e > 2 means 
that either p > 3 or e > 2, which implies ep > e + 2. □ 

Now consider Theorem 7.28 in the case where p is odd. As we already know 
that Z* is cyclic, assume e > 1. Let x £ Zbe chosen so that [x\ p generates Z*. 
Suppose the multiplicative order of [ x] p e e Z* is m. We have x' n = 1 (mod p e )\ 
hence, x m = 1 (mod p), and so it must be the case that p — 1 divides nr, thus, 
\x m / {p ~ ]) \ p ,- has multiplicative order exactly p — 1. By Theorem 6.38, if we find 
an integer y such that \y\ p >‘ has multiplicative order p e ~\ then \x m ^ <p ~ v> y\ lf has 
multiplicative order ( p — I ) p e ~ 1 , and we are done. We claim that y := 1 + p does 
the job. Any integer between 0 and p e — 1 can be expressed as an e-digit number in 
base p\ for example, y = (0 ■ ■ • 0 1 l) p . If we compute successive pth powers of y 
modulo p e , then by Lemma 7.31 we have 


y mod p e = (0 

011) p , 

y p mod p e = (* 

• *10 IV 

2 

y p ~ mod p e = (* 

• *100 iv 

pe ~ 2 mod p e = (10 • • 

oiv 

p ‘ ' mod p e = (0 

0 l) p . 


Here, indicates an arbitrary digit. From this table of values, it is clear (see 
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Theorem 6.37) that [y]^ has multiplicative order p e ~ l . That proves Theorem 7.28 
for odd p. 

We now prove Theorem 7.28 in the case p = 2. For e = 1 and e = 2, the theorem 
is easily verified. Suppose e > 3. Consider the subgroup G C 7L* V generated by 
[5 ] 2 e • Expressing integers between 0 and 2 e — I as e-digit binary numbers, and 


applying Lemma 7.31, we have 



5 mod 2 e 

= (0 •• 

• 0101)2, 

5 2 mod 2 e 

= (* 

• *1001)2, 

5 r ~ 3 mod 2 e 

= (10 •• 

01)2, 

5 2 " 2 mod 2 e 

= (0 •• 

01)2. 


So it is clear (see Theorem 6.37) that [5]2 c has multiplicative order 2 e ~ 2 . We claim 
that [ — 1 1 2 - ^ G. If it were, then since it has multiplicative order 2, and since every 
cyclic group of even order has precisely one element of order 2 (see Theorem 6.32), 

r\Q — 3 

it must be equal to [5 F?; however, it is clear from the above calculation that 

5 2 ‘ ^ -1 (mod 2 e ). Let H C 7,* e be the subgroup generated by [ — l] 2 e - Then 

from the above, GnH = { [ I ]o<- } , and hence by Theorem 6.25, G x II is isomorphic 
to the subgroup G ■ H of Z* e . But since the orders of G x // and Z* e arc equal, we 
must have G ■ H = Z*„. That proves the theorem. 

Example 7.61. Let p be an odd prime, and let d be a positive integer dividing p— 1. 
Since Z* is a cyclic group of order p — 1, Theorem 6.32, implies that (Z*) d is the 
unique subgroup of Z* of order (p — 1 )/d, and moreover, (Z*) d = Z*{(p — l)/d ) ; 
that is, for all a e Z*, we have 

a = [) d for some /? e Z* a (p ~ l) ^ d = 1. 

Setting d = 2, we arrive again at Euler’s criterion (Theorem 2.21), but by a very 
different, and perhaps more elegant, route than that taken in our original proof of 
that theorem. □ 

Exercise 7.69. Show that if n is a positive integer, the group Z* is cyclic if and 
only if 

n = 1,2,4,/, or2/, 

where p is an odd prime and e is a positive integer. 

Exercise 7.70. Let n = pq, where p and q are distinct primes such that p = 2 p'+ 1 
and q = 2q' + 1, where p' and q' are themselves prime. Show that the subgroup 
(Z*) 2 of squares is a cyclic group of order p'q'. 
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Exercise 7.71. Let n = pq, where p and q are distinct primes such that p \ (q— 1) 
and q\ {p — 1). 

(a) Show that the map that sends [a\ n e Z* to [ a" ]„2 e (Z* 2 )" is a group iso- 
morphism (in particular, you need to show that this map is unambiguously 
defined). 

(b) Consider the element a := [1 + n ] n 2 e Z* 2 ; show that for every non-negative 
integer k, a k = [1 + kn\ n ± ; deduce that a has multiplicative order n, and 
also that the identity a k = [1 + kn] n 2 holds for all integers k. 

(c) Show that the map that sends ([&]„, [«]„) e Z„ x Z* to [(1 + kn)a n ]„ 2 e Z* 2 
is a group isomorphism. 

Exercise 7.72. This exercise develops an alternative proof of Theorem 7.29 that 
relies on less group theory. Let n be the order of the group G. Using Theorem 7.14, 
show that for all d \ n, there are at most d elements in the group whose multiplica- 
tive order divides d. From this, deduce that for all d \ n, the number of elements 
of multiplicative order d is either 0 or cp(d). Now use Theorem 2.40 to deduce that 
for all d | n (and in particular, for d = «), the number of elements of multiplicative 
order d is equal to cp(d). 
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To understand the algorithmic aspects of number theory and algebra, and appli- 
cations such as cryptography, a firm grasp of the basics of probability theory is 
required. This chapter introduces concepts from probability theory, starting with 
the basic notions of probability distributions on finite sample spaces, and then 
continuing with conditional probability and independence, random variables, and 
expectation. Applications such as “balls and bins,” “hash functions,” and the “left- 
over hash le mm a” arc also discussed. The chapter closes by extending the basic 
theory to probability distributions on countably infinite sample spaces. 


8.1 Basic definitions 

Let Q be a finite, non-empty set. A probability distribution on Q is a function 
P : Q -» [0, 1] that satisfies the following property: 

2 p (®) = L ( 8 - 1 ) 

co&Q 

The set Q is called the sample space of P. 

Intuitively, the elements of Q represent the possible outcomes of a random 
experiment, where the probability of outcome ® e 12 is P(w). For now, we 
shall only consider probability distributions on finite sample spaces. Later in this 
chapter, in §8.10, we generalize this to allow probability distributions on countably 
infinite sample spaces. 

Example 8.1. If we think of rolling a fair die, then setting Q := {1, 2, 3, 4, 5,6}, 
and P(cw) := 1/6 for all co e £ 2, gives a probability distribution that naturally 
describes the possible outcomes of the experiment. □ 

Example 8.2. More generally, if Q is any non-empty, finite set, and P(o>) := 1 /\Q\ 
for all co e Q. then P is called the uniform distribution on Q. □ 
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Example 8.3. A coin toss is an example of a Bernoulli trial, which in general 
is an experiment with only two possible outcomes: success, which occurs with 
probability p\ and failure, which occurs with probability q := 1 — p. Of course, 
success and failure arc arbitrary names, which can be changed as convenient. In the 
case of a coin, we might associate success with the outcome that the coin comes up 
heads. For a fair coin, we have p = q = 1/2; for a biased coin, we have p f 1/2. □ 

An event is a subset A of Q, and the probability of A is defined to be 

P[A] := 2 p (®)- (8-2) 

CQE.A 

While an event is simply a subset of the sample space, when discussing the proba- 
bility of an event (or other properties to be introduced later), the discussion always 
takes place relative to a particular probability distribution, which may be implicit 
from context. 

For events A and B. their union A U B logically represents the event that either 
the event A or the event B occurs (or both), while their intersection A n B logi- 
cally represents the event that both A and B occur. For an event A, we define its 
complement A := £2 \ A, which logically represents the event that A does not 
occur. 

In working with events, one makes frequent use of the usual rules of Boolean 
logic. De Morgan’s law says that for all events A and B, 

A u B = A n B and A n B = A u B. 

We also have the Boolean distributive law: for all events A, B, and C, 

dn(6uC) = (dn6)u(dnC) and du(BnC) = (dufi)n(du C). 

Example 8.4. Continuing with Example 8.1, the event that the die has an odd 
value is _4. := {1,3,5}, and we have P[_4.] = 1/2. The event that the die has a 
value greater than 2 is B := {3,4, 5,6}, and P \B\ = 2/3. The event that the die 
has a value that is at most 2 is B — {1,2}, and P [B\ = 1/3. The event that the 
value of the die is odd or exceeds 2 is A UB = { 1, 3, 4, 5, 6}, and P[_4. U /3] = 5/6. 
The event that the value of the die is odd and exceeds 2 is A n B = {3,5}, and 
P[4n6] = 1/3. □ 

Example 8.5. If P is the uniform distribution on a set Q. and A is a subset of Q, 
then P[A] = \A\/\Q\. □ 

We next derive some elementary facts about probabilities of certain events, and 
relations among them. It is clear from the definitions that 


P[0] = 0 and P[i2] = 1, 
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and that for every event A, we have 

P[A] = l - p[Al 

Now consider events A and B, and their union A U B. We have 

P[AU B]<P[A\ + P[B]- (8.3) 

moreover, 

P [A U B\ = P [A] + P [B] if A and B are disjoint, (8.4) 

that is, if A n B = 0. The exact formula for arbitrary events A and B is: 

P[A u B] = P[A] + P[£] - P[M n B\. (8.5) 

(8.3), (8.4), and (8.5) all follow from the observation that in the expression 
P[A] + P[B] = P(®) + X P( ® } ’ 

(q€.*A. oeJ3 

the value P(®) is counted once for each co e A U B, except for those co e A n B, 
for which P(cw) is counted twice. 

Example 8.6. Alice rolls two dice, and asks Bob to guess a value that appeal's on 
either of the two dice (without looking). Let us model this situation by considering 
the uniform distribution on Q := { 1 , . . . , 6 } x { 1 ,..., 6 }, where for each pair 
(, s , t) e £2, s represents the value of the first die, and t the value of the second. 

For k = 1, . . . , 6, let Ak be the event that the first die is k, and B k the event that 
the second die is k. Let C/ ( = Ak U B k be the event that k appeal's on either of the 
two dice. No matter what value k Bob chooses, the probability that this choice is 
correct is 


P [Cki = P [A k u B k ] = P [A k ] + P [B k ] - P [A k n BA 

= 1/6+ 1/6- 1/36 = 11/36, 

which is slightly less than the estimate P [A k \ + P[Bk\ obtained from (8.3). □ 

If { Aj } , 6 / is a family of events, indexed by some set I, we can naturally form the 
union (J /e/ Aj and intersection fj/s/ Aj. If 1 = 0. then by definition, the union is 0, 
and by special convention, the intersection is the entire sample space Q. Logically, 
the union represents the event that some Aj occurs, and the intersection represents 
the event that all the Aj' s occur. De Morgan’s law generalizes as follows: 

U Aj = Pi Aj and Q Aj = |J Aj, 

i&I iel i&I i€l 
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and if B is an event, then the Boolean distributive law generalizes as follows: 

6° (U a) = |J (Bn A) and Bu (p| a) = p|(Bu^,)- 

isl i£l isl iel 

We now generalize (8.3), (8.4), and (8.5) from pairs of events to families of 
events. Let be a finite family of events (i.e., the index set I is finite). 

Using (8.3), it follows by induction on |/| that 

P [U a] < £ P[ al (8.6) 

is. I isl 

which is known as Boole’s inequality (and sometimes called the union bound). 
Analogously, using (8.4), it follows by induction on |/| that 

p [lH = I p W'] i} je i is pairwise disjoint, (8.7) 

i si isl 

that is, if Aj n Aj = 0 for all i,j e I with i ± j. We shall refer to (8.7) as Boole’s 
equality. Both (8.6) and (8.7) arc invaluable tools in calculating or estimating the 
probability of an event A by breaking A up into a family {A; },-<=/ of smaller, and 
hopefully simpler, events, whose union is A. We shall make frequent use of them. 

The generalization of (8.5) is messier. Consider first the case of three events, A, 
B, and C. We have 

P[A u B u C] = P[A] + P[B] + P[C] - P [A n B] - P [A n C] - P[B n C ] 

+ P[4nBnC]. 

Thus, stalling with the sum of the probabilities of the individual events, we have 
to subtract a “correction term” that consists of the sum of probabilities of all inter- 
sections of pairs of events; however, this is an “over-correction,” and we have to 
correct the correction by adding back in the probability of the intersection of all 
three events. The general statement is as follows: 

Theorem 8.1 (Inclusion/exclusion principle). Let { A, j ,<=/ be a finite family of 
events. Then 

p [lH = E 

16 / 0 CJCI jsj 

the sum being over all non-empty subsets J of I. 

Proof. For co e 12 and B C 12, define 5 m [B] := 1 if m e B, and 8 W \E\ := 0 
if oo £ B. As a function of co, 5 n , [ B ] is simply the characteristic function of 
B. One may easily verify that for all co e Q, B C Q, and C C 12, we have 
<5(»[B] = 1 - d ffl [B] and b m \B n C] = 5 m [B]8 m [C]. It is also easily seen that for 
every B C 12, we have P(®)<5®[B] = P[B], 
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Let A := U,eZ A-„ and for J C /, let Aj := flje/ A- F° r ever y co e £2, 

1 - 8 0) [ A | = <UA = S m [f| A-] = AIAI = Re 1 - MAD 

16 / ( 6 / ( 6 / 

= = e (_i)|/| maa 

/C/ ;6/ /CJ 

and so 

<UA = 2 (-l) |/hl «®[^/]- (8-8) 

0C/C / 

Multiplying (8.8) by P(a>), and summing over all ® e 12, we have 

pui = E pmma = E P(w) E (- i ) |/hl5 ®[^] 

6)6/2 6)6/2 0C/C7 

= E (- i ) |Jhl E pmmai = E (- i ) |/hl p [Ai. n 

0C/CJ 6)6/2 0C/CJ 

One can also state the inclusion/exclusion principle in a slightly different way, 
splitting the sum into terms with |/| = 1, |/| = 2, etc., as follows: 

p [u- 4 '] = £ p[ - 4ii + iy-D *" 1 e p [D- 4 ^ 

i£l i€l k=2 J C/ je/ 

\j\=k 

where the last sum in this formula is taken over all subsets / of I of size k. 

We next consider a useful way to “glue together” probability distributions. Sup- 
pose one conducts two physically separate and unrelated random experiments, with 
each experiment modeled separately as a probability distribution. What we would 
like is a way to combine these distributions, obtaining a single probability dis- 
tribution that models the two experiments as one grand experiment. This can be 
accomplished in general, as follows. 

Let Pi : Q\ -> [0, 1] and P 2 : (h -» [0, 1] be probability distributions. Their 
product distribution P := Pi P 2 is defined as follows: 

P : Q\ x Qi -> [0, 1] 

(®l,ft>2) !-*• Pi(ffli) P2(®2)‘ 

It is easily verified that P is a probability distribution on the sample space Q\ x ily. 

E P(®1,®2)= E P l(®l) P 2(®2)=(E P l(®l))(E P2(CU2) ) = 1 - 1 = 1 - 

CO l ,0)2 CQ\,CQ2 0) 1 <02 

More generally, if P,- : Q t [0, 1], for i = arc probability distributions, 
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then their product distribution is P := Pi • • • P„, where 
P : X2i x • • • x Q n -» [0, 1] 

(®t,-. -,<O n ) ^ Pl(®l) ' ' ' P n(®n)- 

If Pi = P 2 = • • • = P„, then we may write P = P". It is clear from the definitions 
that if each P, is the uniform distribution on £2 t , then P is the uniform distribution 
on Q\ x • ■ • x Q n . 

Example 8.7. We can view the probability distribution P in Example 8.6 as Pp 
where Pi is the uniform distribution on { 1 , . . . , 6} . □ 


Example 8.8. Suppose we have a coin that comes up heads with some probability 
p, and tails with probability q := 1 — p. We toss the coin n times, and record the 
outcomes. We can model this as the product distribution P = P”, where Pi is the 
distribution of a Bernoulli trial (see Example 8.3) with success probability p, and 
where we identify success with heads , and failure with tails. The sample space Q. 
of P is the set of all 2" tuples co = (at i, . . . , co n ), where each co t is either heads or 
tails. If the tuple co has k heads and n — k tails , then Pico) = p k q n ~ k , regardless of 
the positions of the heads and tails in the tuple. 

For each k = 0 n, let Ak be the event that our coin comes up heads exactly 

k times. As a set, Ak consists of all those tuples in the sample space with exactly 
k heads, and so 



from which it follows that 


P [A k ] = 



If our coin is a fair coin, so that p = q = 1 /2, then P is the uniform distribution on 
Q, and for each k = 0 n, we have 


P [A k ] = 



2 ~". □ 


Suppose P : Q -> [0, 1] is a probability distribution. The support of P is defined 
to be the set {co e Q : P(a>) ^0}. Now consider another probability distribution 
P' : D! -> [0, 1]. Of course, these two distributions are equal if and only if Q = D! 
and P(w) = P'(cw) for all co e i 2. However, it is natural and convenient to have a 
more relaxed notion of equality. We shall say that P and P' are essentially equal if 
the restriction of P to its support is equal to the restriction of P' to its support. For 
example, if P is the probability distribution on { 1, 2, 3, 4} that assigns probability 
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1/3 to 1, 2, and 3, and probability 0 to 4, we may say that P is essentially the 
uniform distribution on { 1, 2, 3 } . 


Exercise 8.1. Show that P[A n B\ P[A u B] < P[A] P [B\ for all events A, B. 

Exercise 8.2. Suppose A, B , C are events such that A n C = B n C. Show that 
|PU]-P[B]| < P[C], 


Exercise 8.3. Let m be a positive integer, and let a(m) be the probability that a 
number chosen at random from { 1 ,m] is divisible by either 4, 5, or 6. Write 
down an exact formula for a{m), and also show that a{m) = 14/30 + 0{\/m). 


Exercise 8.4. This exercise asks you to generalize Boole’s inequality (8.6), 
proving Bonferroni’s inequalities. Let {-4 .,}, 6 j be a finite family of events, where 
n := |/|. For m = 0 define 

m 

2 P[f>, • 

k = \ JCI jeJ 

\J\=k 


Also, define 

a := p [U ■ 

i&I 

Show that a < a m if m is odd, and a > a m if m is even. Hint: use induction on n. 


8.2 Conditional probability and independence 

Let P be a probability distribution on a sample space Q. 

For a given event B C Q with P \E\ ± 0, and for co e C2. let us define 


P(® | B) 


P(®)/P[£] if co eB, 
0 otherwise. 


Viewing B as fixed, the function P(- | B) is a new probability distribution on the 
sample space 12, called the conditional distribution (derived from P) given B. 

Intuitively, P(- | B) has the following interpretation. Suppose a random exper- 
iment produces an outcome according to the distribution P. Further, suppose we 
learn that the event B has occurred, but nothing else about the outcome. Then the 
distribution P(- | B) assigns new probabilities to all possible outcomes, reflecting 
the partial knowledge that the event B has occurred. 
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For a given event A C 12, its probability with respect to the conditional distri- 
bution given B is 

P[A\B\= Y j P{w\B)= P[ p^ ] . 
weA L J 

The value P[A\ B] is called the conditional probability of A given B. Again, 
the intuition is that this is the probability that the event A occurs, given the partial 
knowledge that the event B has occurred. 

For events A and B, if P[ A n B\ = P[A] P[/3], then A and B are called inde- 
pendent events. If P \E\ ^ 0, one easily sees that A and B arc independent if 
and only if P[yl | B ] = P| A|; intuitively, independence means that the partial 
knowledge that event B has occurred does not affect the likelihood that A occurs. 

Example 8.9. Suppose P is the uniform distribution on Q, and that B C £2 with 
P\B | ^ 0. Then the conditional distribution given B is essentially the uniform 
distribution on B. □ 

Example 8.10. Consider again Example 8.4, where A is the event that the value 
on the die is odd, and B is the event that the value of the die exceeds 2. Then as 
we calculated, P[A] = 1/2, P [B] = 2/3, and P [A n B] = 1/3; thus, P [A nB] = 
P[y4] P \B\, and we conclude that A and B are independent. Indeed, P|y4 [ B\ = 
(1 /3)/(2/3) = 1/2 = P[_4.] ; intuitively, given the partial knowledge that the value 
on the die exceeds 2, we know it is equally likely to be either 3, 4, 5, or 6, and so 
the conditional probability that it is odd is 1/2. 

However, consider the event C that the value on the die exceeds 3. We have 
P[C] = 1/2 and P[y4nC] = 1/6 ^ 1/4, from which we conclude that A and C 
are not independent. Indeed, P\A\C] = (l/6)/(l/2) = 1/3 ^ P[y4]; intuitively, 
given the partial knowledge that the value on the die exceeds 3, we know it is 
equally likely to be either 4, 5, or 6, and so the conditional probability that it is odd 
is just 1/3, and not 1/2. □ 

Example 8.11. In Example 8.6, suppose that Alice tells Bob the sum of the two 
dice before Bob makes his guess. The following table is useful for visualizing the 
situation: 


6 

7 

8 

9 

10 

11 

12 

5 

6 

7 

8 

9 

10 

11 

4 

5 

6 

7 

8 

9 

10 

3 

4 

5 

6 

7 

8 

9 

2 

3 

4 

5 

6 

7 

8 

1 

2 

3 

4 

5 

6 

7 


1 

2 

3 

4 

5 

6 


For example, suppose Alice tells Bob the sum is 4. Then what is Bob’s best strategy 
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in this case? Let D? be the event that the sum is t, for t = 2, . . . , 12, and consider 
the conditional distribution given D 4 . This conditional distribution is essentially 
the uniform distribution on the set {(1,3), (2, 2), (3, 1)}. The numbers 1 and 3 both 
appeal - in two pairs, while the number 2 appeal's in just one pair. Therefore, 

P[Ci | D 4 \ = P[C 3 I D 4 ] = 2/3, 

while 

P[C 2 | D 4 \ = 1/3 

and 

P[C 4 | D 4 ] = P[C 5 | D 4 | = P[C 6 | D 4 | = 0. 

Thus, if the sum is 4, Bob’s best strategy is to guess either 1 or 3, which will be 
correct with probability 2/3. 

Similarly, if the sum is 5, then we consider the conditional distribution given D 5 , 
which is essentially the uniform distribution on {(1,4), (2, 3), (3, 2), (4, 1) } . In this 
case, Bob should choose one of the numbers k = 1, . . . , 4, each of which will be 
correct with probability P[C/< | V 5 ] = 1/2. □ 


Suppose {jS,-}/ g j is a finite, pairwise disjoint family of events, whose union is 
C2. Now consider an arbitrary event A. Since { A n 73, } , e / is a pairwise disjoint 
family of events whose union is A, Boole’s equality (8.7) implies 

P[A] = 2 p M- n B >\- (8-9) 

iel 

Furthermore, if each B , occurs with non-zero probability (so that, in particular, 
{Bj} ie j is a partition of C2), then we have 

P[A] = ^ j P[A\B i ]P[B i ]. (8.10) 

iel 


If, in addition, P[_4.] ^ 0, then for each j e /, we have 


P [Bj | .4] = 


P [AnBj] 

PUl 


P [A | Bj] P [Bj] 

Z^PUI^JP^]' 


( 8 . 11 ) 


Equations (8.9) and (8. 10) are sometimes called the law of total probability, while 
equation (8.11) is known as Bayes’ theorem. Equation (8.10) (resp., (8.11)) is 
useful for computing or estimating P[^4] (resp., P [Bj \ A ]) by conditioning on the 
events 73,. 


Example 8.12. Let us continue with Example 8.11, and compute Bob’s overall 
probability of winning, assuming he follows an optimal strategy. If the sum is 2 or 
12, clearly there is only one sensible choice for Bob to make, and it will certainly 
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be correct. If the sum is any other number l, and there are N( pairs in the sample 
space that sum to that number, then there will always be a value that appeal's in 
exactly 2 of these N( pairs, and Bob should choose such a value (see the diagram 
in Example 8.1 1). Indeed, this is achieved by the simple rule of choosing the value 
1 if l < 7, and the value 6 if £ > 7. This is an optimal strategy for Bob, and if C is 
the event that Bob wins following this strategy, then by total probability (8.10), we 
have 

12 

P[C] = 2 P[C I \ F ™- 

1=2 


Moreover, 

P[C I D 2 | P[D 2 \ = 1-4 = 4, P[C I D n \ P [D 12 ] = 1 • 4 
3b 3b 3b 

and for ( = 3, . . . , 1 1 , we have 


2 Nf 1 

P[C Df] P\V t \ = = — . 

1 f Nf 36 18 


1 

36’ 


Therefore, 


P[C] 


1 1 9 _ 10 

36 + 36 + 18 _ l8‘ 


Example 8.13. Suppose that the rate of incidence of disease X in the overall pop- 
ulation is 1%. Also suppose that there is a test for disease X\ however, the test is 
not perfect: it has a 5% false positive rate (i.e., 5% of healthy patients test positive 
for the disease), and a 2% false negative rate (i.e., 2% of sick patients test negative 
for the disease). A doctor gives the test to a patient and it comes out positive. How 
should the doctor advise his patient? In particular, what is the probability that the 
patient actually has disease X. given a positive test result? 

Amazingly, many trained doctors will say the probability is 95%, since the test 
has a false positive rate of 5%. However, this conclusion is completely wrong. 

Let A be the event that the test is positive and let B be the event that the patient 
has disease X. The relevant quantity that we need to estimate is P \B \ A\\ that is, 
the probability that the patient has disease X, given a positive test result. We use 
Bayes’ theorem to do this: 


P [B | A] 


P [A 1 B\ P [B\ 

P [A | B\ P[B] + P [A | B\ P [B] 


0.98 • 0.01 

0.98-0.01 +0.05-0.99 


* 0.17. 


Thus, the chances that the patient has disease X given a positive test result are just 
17%. The correct intuition here is that it is much more likely to get a false positive 
than it is to actually have the disease. 

Of course, the real world is a bit more complicated than this example suggests: 
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the doctor may be giving the patient the test because other risk factors or symp- 
toms may suggest that the patient is more likely to have the disease than a random 
member of the population, in which case the above analysis does not apply. □ 

Example 8.14. This example is based on the TV game show “Let’s make a deal,” 
which was popular in the 1970’s. In this game, a contestant chooses one of three 
doors. Behind two doors is a “zonk,” that is, something amusing but of little or 
no value, such as a goat, and behind one of the doors is a “grand prize,” such 
as a cai - or vacation package. We may assume that the door behind which the 
grand prize is placed is chosen at random from among the three doors, with equal 
probability. After the contestant chooses a door, the host of the show, Monty Hall, 
always reveals a zonk behind one of the two doors not chosen by the contestant. 
The contestant is then given a choice: either stay with his initial choice of door, or 
switch to the other unopened door. After the contestant finalizes his decision on 
which door to choose, that door is opened and he wins whatever is behind it. The 
question is, which strategy is better for the contestant: to stay or to switch? 

Let us evaluate the two strategies. If the contestant always stays with his initial 
selection, then it is clear that his probability of success is exactly 1 /3. 

Now consider the strategy of always switching. Let B be the event that the 
contestant’s initial choice was correct, and let A be the event that the contestant 
wins the grand prize. On the one hand, if the contestant’s initial choice was correct, 
then switching will certainly lead to failure (in this case, Monty has two doors to 
choose from, but his choice does not affect the outcome). Thus, P[M | B\ = 0. 
On the other hand, suppose that the contestant’s initial choice was incorrect, so 
that one of the zonks is behind the initially chosen door. Since Monty reveals the 
other zonk, switching will lead with certainty to success. Thus, P[M | B\ = 1. 
Furthermore, it is clear that P[ B ] = 1/3. So using total probability (8.10), we 
compute 

P [A] = P [A | B\ P [B] + P [A | B\ P [B\ = 0 • (1/3) + 1 • (2/3) = 2/3. 

Thus, the “stay” strategy has a success probability of 1/3, while the “switch” 
strategy has a success probability of 2/3. So it is better to switch than to stay. 

Of course, real life is a bit more complicated. Monty did not always reveal a 
zonk and offer a choice to switch. Indeed, if Monty only revealed a zonk when 
the contestant had chosen the correct door, then switching would certainly be the 
wrong strategy. However, if Monty’s choice itself was a random decision made 
independently of the contestant’s initial choice, then switching is again the pre- 
ferred strategy. □ 

We next generalize the notion of independence from pairs of events to families 
of events. Let be a finite family of events. For a given positive integer k. 
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we say that the family is /c-wise independent if the following holds: 

»[rH-n P ] for all J C I with |/| < k. 

jcJ jeJ 

The family { / is called pairwise independent if it is 2-wise independent. 
Equivalently, pairwise independence means that for all i, j e I with /' ^ j, we have 
P [ Ai fl Aj\ = P[yl, ] P[.4. 7 ], or put yet another way, that for all i,j e I with i j, 
the events A, and Aj are independent. 

The family {A-,} jet is called mutually independent if it is /c-wise independent 
for all positive integers k. Equivalently, mutual independence means that 

"[rH-n P [Aj] for all J CL 

j£j j£J 

If n := |/| > 0, mutual independence is equivalent to n- wise independence; more- 
over, if 0 < k < n, then { *4./ },•<=/ is /c-wise independent if and only if [Aj}j e j is 
mutually independent for every J C. I with |/| = k. 

In defining independence, the choice of the index set I plays no real role, and 
we can rename elements of I as convenient. 

Example 8.15. Suppose we toss a fair coin three times, which we formally model 
using the uniform distribution on the set of all 8 possible outcomes of the three 
coin tosses: (heads, heads, heads), (heads, heads, tails), etc., as in Example 8.8. 
For i = 1,2, 3, let Aj be the event that the ith toss comes up heads. Then {-4,}: =| 
is a mutually independent family of events, where each individual Aj occurs with 
probability 1/2. 

Now let #12 be the event that the first and second tosses agree (i.e., both heads 
or both tails), let #13 be the event that the first and third tosses agree, and let #23 
be the event that the second and third tosses agree. Then the family of events 
#12, #13, #23 is pairwise independent, but not mutually independent. Indeed, the 
probability that any given individual event occurs is 1/2, and the probability that 
any given pair of events occurs is 1/4; however, the probability that all three events 
occur is also 1/4, since if any two events occur, then so does the third. □ 

We close this section with some simple facts about independence of events and 
their complements. 

Theorem 8.2. If A and B are independent events, then so are A and #. 

Proof. We have 

P[.4J = P [A n #] + P [A n #] (by total probability (8.9)) 

= P[_4] P[#] + P|y4 n #] (since A and # are independent). 
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Therefore, 

P [A n B] = P [A] - P [A] P [B] = P |X|(1 - P [B]) = P [A] P [B\. □ 

This theorem implies that 

A and B arc independent <=> A and B are independent 

<=> A and B " " 

<=> A and B " " 

The following theorem generalizes this result to families of events. It says that 
if a family of events is /c-wise independent, then the family obtained by comple- 
menting any number of members of the given family is also /c-wise independent. 

Theorem 8.3. Let {XAe/ f> e a Unite, k-wise independent family of events. Let 
J be a subset of I, and for each i e I, define A\ := X if ' £ -b and A\ := X if 
i J . Then X'}/e/ is also k-wise independent. 

Proof. It suffices to prove the theorem for the case where J = / \ \d), for an 
arbitrary del: this allows us to complement any single member of the family 
that we wish, without affecting independence; by repeating the procedure, we can 
complement any number of them. 

To this end, it will suffice to show the following: if / C I, |/| < k, d e I \ J, 
and Aj := fljs/ Ah we have 

P [Ad n Aj ] = (1 - P[XX J] p b4X (8.12) 

jeJ 

Using total probability (8.9), along with the independence hypothesis (twice), we 
have 

Yl P [Aj] = P[Aj] = P [A d n Aj] + P [A d n Aj\ 

j£j 

= p[A d ] • n pw/] + p [^ n 

& 

from which (8.12) follows immediately. □ 


Exercise 8.5 . For events X, • • • , X, define ai := P[X], and for i = 2, ... ,n, 
define a, := P[X | A\ n ■ ■ ■ n X_il (assume that P[X n ■ ■ ■ fl X„_i] f 0). Show 
that PXi fl ■ ■ ■ fl A n ] = <*!■■■ a n . 

Exercise 8.6 . Let B be an event, and let { B t } i€ / be a finite, pairwise disjoint 
family of events whose union is B. Generalizing the law of total probability 
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(equations (8.9) and (8.10)), show that for every event A, we have P[y4 n B\ - 
2 /6 j PM. n B,\, and if P [B] # 0 and /*:={/ e I : P [B{\ ± 0}, then 

PM I B] P [B] = £ PM | J3,-| P[B,]. 

iel* 

Also show that if PM | B{\ < a for each i e I*, then PM \B\ < a. 

Exercise 8.7 . Let B be an event with P \B\ ± 0, and let {C,}, 6 / be a finite, pair- 
wise disjoint family of events whose union contains B. Again, generalizing the law 
of total probability, show that for every event A, if I* := {/ e I : P[/3 n C,] ^ ()}, 
then we have 

P[A\B]= J^P[A\BnC t ]P[C,\B]. 

iel* 

Exercise 8.8. Three fair coins arc tossed. Let A be the event that at least two 
coins arc heads. Let B be the event that the number of heads is odd. Let C be the 
event that the third coin is heads. Are A and B independent? A and C? B and Cl 

Exercise 8.9. Consider again the situation in Example 8.11, but now suppose 
that Alice only tells Bob the value of the sum of the two dice modulo 6. Describe 
an optimal strategy for Bob, and calculate his overall probability of winning. 

Exercise 8.10. Consider again the situation in Example 8.13, but now suppose 
that the patient is visiting the doctor because he has symptom Y. Furthermore, it 
is known that everyone who has disease X exhibits symptom Y, while 10% of the 
population overall exhibits symptom Y. Assuming that the accuracy of the test 
is not affected by the presence of symptom Y, how should the doctor advise his 
patient should the test come out positive? 

Exercise 8.11. This exercise develops an alternative proof, based on probability 
theory, of Theorem 2.11. Let n be a positive integer and consider an experiment 

in which a number a is chosen uniformly at random from {0 ,n — I } . If 

n = • • • p e / is the prime factorization of n. let A, be the event that a is divisible 

by pi , for i = \,...,r. 

(a) Show that cp{n)/n = P[yli n - - - n A r \, where cp is Euler’s phi function. 

(b) Show that if J C { 1, . . . , r} , then 

p [fH=>/ il^ 

,/ei jsJ 

Conclude that { A -, } r = ] is mutually independent, and that P[A,j = 1/p, for 
each /' = 1 , . . . , r. 
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(c) Using part (b), deduce that 

r 

PWtn-ni,] =Y[(1-1/Pi). 

/=i 

(d) Combine parts (a) and (c) to derive the result of Theorem 2.11 that 

r 

(Pin) = »]> _ 1 //>/)• 

i=l 

8.3 Random variables 

It is sometimes convenient to associate a real number, or other mathematical object, 
with each outcome of a random experiment. The notion of a random variable 
formalizes this idea. 

Let P be a probability distribution on a sample space Q. A random variable 
X is a function X : Q. -> S, where S is some set, and we say that X takes values 
in S. We do not require that the values taken by X are real numbers, but if this 
is the case, we say that X is real valued. For s e S, “X = s” denotes the event 
{cw e Q : X(cw) = 5 } . It is immediate from this definition that 

p[x = 5] = 2 P((W) ‘ 

(ueX-'Uti) 

More generally, for any predicate cp on S, we may write “</>(X)” as shorthand for the 
event {cw e Q : <p(X(a>))}. When we speak of the image of X, we simply mean its 
image in the usual function-theoretic sense, that is, the set X(Q) = {X(cw) : co e Q}. 
While a random variable is simply a function on the sample space, any discussion 
of its properties always takes place relative to a particular probability distribution, 
which may be implicit from context. 

One can easily combine random variables to define new random variables. Sup- 
pose X\ X„ arc random variables, where X, : Q — > S t for i = Then 

(Xi, . . . ,X„) denotes the random variable that maps cw e Q. to (Xi(o>), X„(ft»)) £ 

Sj x • • • x S„. If / : 5j x • • • x S n — ► T is a function, then /(X 1 , . . . ,X„) denotes the 
random variable that maps cw 6 Q to /(Xi(cw), . . . ,X„(cw)). If / is applied using a 
special notation, the same notation may be applied to denote the resulting random 
variable; for example, if X and Y arc random variables taking values in a set S, 
and * is a binary operation on S, then X*Y denotes the random variable that maps 
cw 6 Q to X(cw) * Y{co) e S. 

Let X be a random variable whose image is S. The variable X determines a 
probability distribution Px : S -> [0, 1] on the set S, where PxU) := P[X = 5 ] for 
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each s e S. We call Px the distribution of X. If Px is the uniform distribution on 
S , then we say that X is uniformly distributed over S. 

Suppose X and Y arc random variables that take values in a set S. If P[X = 5 ] = 
P[y = s] for all .v e .S', then the distributions of X and Y arc essentially equal even 
if their images arc not identical. 

Example 8.16. Again suppose we roll two dice, and model this experiment as the 
uniform distribution on Q := { 1, . . . , 6} x { 1, . . . , 6} . We can define the random 
variable X that takes the value of the first die, and the random variable Y that takes 
the value of the second; formally, X and Y arc functions on £2, where 

X(s,t ) := s and Y(s,t ) := t for ( 5 , t) e Q. 

For each value s e { 1, . . . , 6}, the event X = s is {(s, 1), . . . , (s, 6)}, and so 
P[X = s] = 6/36 = 1/6. Thus, X is uniformly distributed over { 1, . . . , 6} . Like- 
wise, Y is uniformly distributed over { 1, . . . , 6}, and the random variable (X, Y) is 
uniformly distributed over Q. We can also define the random variable Z := X + Y, 
which formally is the function on the sample space defined by 

Z(s, t) := s + t for (s, t) e Q. 

The image of Z is {2, . . . , 12} , and its distribution is given by the following table: 


u 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

PI Z = u] 

1/36 

2/36 

3/36 

4/36 

5/36 

6/36 

5/36 

4/36 

3/36 

2/36 

1/36 


Example 8.17. If A is an event, we may define a random variable X as follows: 
X := 1 if the event A occurs, and X := 0 otherwise. The variable X is called the 
indicator variable for A. Formally, X is the function that maps co e A to 1, and 
cw e 12 \ A to 0; that is, X is simply the characteristic function of A. The distribution 
of X is that of a Bernoulli trial: P[X = 1] = P[_4.] and P[X = 0] = 1 - P[.4.]. 

It is not hard to see that 1 - X is the indicator variable for A. Now suppose B is 
another event, with indicator variable Y . Then it is also not hard to see that XY is 
the indicator variable for A n B, and that X + Y — XY is the indicator variable for 
A U B: in particular, if A n B = 0, then X + Y is the indicator variable for A U B. □ 

Example 8.18. Consider again Example 8.8, where we have a coin that comes up 
heads with probability p, and tails with probability q := I —p, and we toss it n times. 
For each i = 1, let Aj be the event that the z'th toss comes up heads, and let 
X, be the corresponding indicator variable. Let us also define X := X\ + ■ ■ ■ + X„, 
which represents the total number of tosses that come up heads. The image of X 
is {0, . . . , n). By the calculations made in Example 8.8, for each k = 0, we 
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have 


P [X = k]=( k Jp k q n ~ k 


The distribution of the random variable X is called a binomial distribution. Such 
a distribution is parameterized by the success probability p of the underlying 
Bernoulli trial, and by the number of times n the trial is repeated. □ 


Uniform distributions arc very nice, simple distributions. It is therefore good to 
have simple criteria that ensure that certain random variables have uniform distri- 
butions. The next theorem provides one such criterion. We need a definition: if S 
and T arc finite sets, then we say that a given function / : S -> T is a regular 
function if every element in the image of / has the same number of pre-images 
under /. 


Theorem 8.4, Suppose f : S —> T is a surjective, regular function, and that X 
is a random variable that is uniformly distributed over S. Then f ( X ) is uniformly 
distributed over T. 


Proof. The assumption that / is surjective and regular implies that for every t e T, 
the set S, := /“*({!}) has size |.S'j/|7j. So, for each t e T, working directly from 
the definitions, we have 

p[/ ( x) = t]= 2 p (®) = E E P( ® } = E P[X = 5] 

cu€X~ l (S,) s€S, ©ex-fits)) seS, 

= ^1/\S\ = (\S\/\T\)/\S\ = 1/\T\. □ 

seS, 

As a corollary, we have: 

Theorem 8.5. Suppose that p : G -» G' is a surjective homomorphism of finite 
abelian groups G and G' , and that X is a random variable that is uniformly dis- 
tributed over G. Then p(X) is uniformly dishibuted over G'. 

Proof. It suffices to show that p is regular. Recall that the kernel K of p is a 
subgroup of G, and that for every g' e G' , the set /U 1 !}#'}) is a coset of K (see 
Theorem 6.19); moreover, every coset of K has the same size (see Theorem 6.14). 
These facts imply that p is regular. □ 

Example 8.19. Let us continue with Example 8. 16. Recall that for a given integer 
a, and positive integer n, \a\ n e Z„ denotes the residue class of a modulo n. Let 
us define X' := |A| f , and Y' := | / Ig. It is not hard to see that both X' and Y' arc 
uniformly distributed over Zg, while (X', Y') is uniformly distributed over Z f , x Zg. 
Let us define Z' := X' + Y' (where addition here is in Zg). We claim that Z' is 
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uniformly distributed over Zg. This follows immediately from the fact that the map 
that sends (a, b) e Zg x Zg to a + b e Zg is a surjective group homomorphism (see 
Example 6.45). Further, we claim that (X', Z') is uniformly distributed over ZgxZg. 
This follows immediately from the fact that the map that sends (a, h) e Zg x Zg 
to (a, a + /)) e Z f , x Zf, is a surjective group homomorphism (indeed, it is a group 
isomorphism). □ 

Let X be a random variable whose image is S. Let B be an event with P[ B ] ^ 0. 
The conditional distribution of X given B is defined to be the distribution of X rel- 
ative to the conditional distribution P( j B), that is, the distribution Px\s : S -» [0. 1] 
defined by Px\b{s) := P[X = s \ B\ for s e S. 

Suppose X and Y are random variables, with images S and T, respectively. We 
say X and Y are independent if for all s e S and all t e T, the events X = s and 
Y = t are independent, which is to say, 

P[(X = s) n (Y = 0] = P[X = s] P[Y = t]. 

Equivalently, X and Y arc independent if and only if the distribution of (X, Y) is 
essentially equal to the product of the distribution of X and the distribution of Y. As 
a special case, if X is uniformly distributed over S, and Y is uniformly distributed 
over T, then X and Y are independent if and only if (X, Y) is uniformly distributed 
over S x T. 

Independence can also be characterized in terms of conditional probabilities. 
From the definitions, it is immediate that X and Y are independent if and only if for 
all values t taken by Y with non-zero probability, we have 

P[X = 5 | Y = t] = P[X = 5] 

for all s e A; that is, the conditional distribution of X given Y = t is the same 
as the distribution of X. From this point of view, an intuitive interpretation of 
independence is that information about the value of one random variable does not 
reveal any information about the value of the other. 

Example 8.20. Let us continue with Examples 8.16 and 8.19. The random vari- 
ables X and Y arc independent: each is uniformly distributed over { 1, . . . , 6} , and 
(X, Y) is uniformly distributed over { 1, . . . , 6 } x {1,...,6}. Let us calculate the 
conditional distribution of X given Z = 4. We have P[X = s \ Z = 4] = 1/3 
for s = 1,2,3, and P[X = s \ Z = 4] = 0 for s = 4,5,6. Thus, the con- 
ditional distribution of X given Z = 4 is essentially the uniform distribution on 
{1,2,3}. Let us calculate the conditional distribution of Z given X = 1 . We have 
P[Z = u | X = 1] = 1/6 for u = 2,..., 7, and P[Z = u | X = 1] =0 for 
u = 8, . . . , 12. Thus, the conditional distribution of Z given X = 1 is essentially 
the uniform distribution on { 2, . . . , 7 } . In particular', it is clear - that X and Z arc 
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not independent. The random variables X' and Y' arc independent, as arc X' and 
Z'\ each of X' . Y' , and Z' is uniformly distributed over Zg, and each of (X', Y') and 
(X', Z') is uniformly distributed over Zg x Zg. □ 


We now generalize the notion of independence to families of random variables. 
Let {Xj}j €l be a finite family of random variables. Let us call a corresponding 
family of values {s,}, 6 / an assignment to {X,}, 6 / if s, is in the image of X, for 
each i e I. For a given positive integer k, we say that the family {X,}; 6 j is k- 
wise independent if for every assignment {s,} i6 j to {X, } !6 j, the family of events 
{Xi = Sj}j € j is A;-wise independent. 

The notions of pairwise and mutual independence for random variables arc 
defined following the same pattern that was used for events. The family {X;}, 6 / is 
called pairwise independent if it is 2-wise independent, which means that for all 
i,j e I with i ^ j, the variables X, and X y arc independent. The family {X, }, 6 / is 
called mutually independent if it is k - wise independent for all positive integers 
k. Equivalently, and more explicitly, mutual independence means that for every 
assignment to {X ( } (6 /, we have 




= P[X, = Sj \ for all J Cl. 


(8.13) 


If n := |/[ > 0, mutual independence is equivalent to n-wise independence; more- 
over, if 0 < k < n, then {X/}, e / is /c-wise independent if and only if { Xj}j e j is 
mutually independent for every J C. I with |/| = k. 


Example 8.21. Returning again to Examples 8.16, 8.19, and 8.20, we see that 
the family of random variables X', Y'.Z' is pairwise independent, but not mutually 
independent; for example, 

P[(X' = [0] 6 ) n (V" = [0] 6 ) n (Z' = [0] 6 )j = 1/6 2 , 
but 

P[x' = [0] 6 ] • P[v" = [0] 6 ] • p [z' = [0] 6 ] = 1/6 3 . □ 

Example 8.22. Suppose {-4., }, 6 / is a finite family of events. Let {X, } /e / be the 
corresponding family of indicator variables, so that for each i e /, X, = 1 if A, 
occurs, and X, = 0, otherwise. Theorem 8.3 immediately implies that for every 
positive integer k, {Aj}j € i is L-wise independent if and only if {X, } /e / is /<-wise 
independent. □ 


Example 8.23. Consider again Example 8.15, where we toss a fair coin 3 times. 
For i = 1,2,3, let X, be the indicator variable for the event A, that the ;th toss 
comes up heads. Then {X,-): =| is a mutually independent family of random vari- 
ables. Let Y 12 be the indicator variable for the event B 12 that tosses 1 and 2 agree; 
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similarly, let be the indicator variable for the event #13, and Y 23 the indicator 
variable for Bi}. Then the family of random variables Y 12 , T13, Y23 is pairwise 
independent, but not mutually independent. □ 

We next present a number of useful tools for establishing independence. 

Theorem 8.6. Let X be a random variable with image S, and Y be a random 
variable with image T. Further, suppose that f : S [0. I ] and g : T -» [0, 1] 
are functions such that 

£/(*) = £ g(0 = 1, (8.14) 

seS teT 

and that for all s e S and t e T, we have 

P[(X = s)n(Y = t)] = f(s)g(t). (8.15) 

Then X and Y are independent, the distribution of X is f, and the distribution of 
Y is g. 

Proof. Since {Y = 1\, € t is a partition of the sample space, making use of total 
probability (8.9), along with (8.15) and (8.14), we see that for all s e S, we have 

P[X = s] = 2 P[(X = s)n(Y = t)] = J j /(S)g(0 = f(s) 2 git) = m. 

t&T t&T teT 

Thus, the distribution of X is indeed /. Exchanging the roles of X and Y in the 
above argument, we see that the distribution of Y is g. Combining this with (8.15), 
we see that X and Y arc independent. □ 

The generalization of Theorem 8.6 to families of random variables is a bit messy, 
but the basic idea is the same: 

Theorem 8.7. Let { X, } i€ / be a finite family of random variables, where each X, 
has image S,-. Also, let { /, } , 6 / be a family of functions, where for each i e I, 
fi : Si -> [ 0 , 1] and 'Z JS . eSj fi(Sj ) = 1. Further, suppose that 

p [nc*H = 

iel iel 

for each assignment {s, } (S / to { X, j i€ /. Then the family {X, } i€ j is mutually inde- 
pendent, and for each i e I, the distribution of X, is /, . 

Proof. To prove the theorem, it suffices to prove the following statement: for every 
subset J of I, and every assignment {Sj}j e j to {X ; } /6 ./, we have 

P [n ( x, =5 ,)]=n/^)- 

jeJ jeJ 
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Moreover, it suffices to prove this statement for the case where J = I \ {cl}, for 
an arbitrary del: this allows us to eliminate any one variable from the family, 
without affecting the hypotheses, and by repeating this procedure, we can eliminate 
any number of variables. 

Thus, let d e I be fixed, let J := I \ [d], and let {sj} J€ j be a fixed assignment 
to {Xj }/£./■ Then, since { Xj = Sd} Sd eS d is a partition of the sample space, we have 

p[n^ = = p [ u (o = *>)] = z p [n (x ' = 

j&J s d eS d iel S d es d iel 

= e n/^)=n/^)-E ° 

sdtSd tel jeJ sdSS d jeJ 

This theorem has several immediate consequences. First of all, mutual inde- 
pendence may be more simply characterized: 

Theorem 8.8. Let { X, } , e / be a finite family of random variables. Suppose that for 
every assignment {.s,}, 6 / to {X,} 16 j, we have 

p[n^ = h)] = n p|x = ■ s ' 1 - 

iel iel 

Then {X, },<=/ is mutually independent. 

Theorem 8.8 says that to check for mutual independence, we only have to con- 
sider the index set J = I in (8.13). Put another way, it says that a family of 
random variables {X,-}" =1 is mutually independent if and only if the distribution of 
(X i , . . . ,X„) is essentially equal to the product of the distributions of the individual 

X,’s. 

Based on the definition of mutual independence, and its characterization in The- 
orem 8.8, the following is also immediate: 

Theorem 8.9. Suppose {X,}" =1 is a family of random variables, and that m is an 
integer with 0 < m < n. Then the following are equivalent: 

(i) { X, } ” = | is mutually independent; 

(ii) { Xj } | is mutually independent, {X ,- } x is mutually independent, and 

the two variables (X i X m ) and (X m+ i, . . . ,X„) are independent. 

The following is also an immediate consequence of Theorem 8.7 (it also follows 
easily from Theorem 8.4). 

Theorem 8.10. Suppose that X\ X„ are random variables, and that S\, ... , S fl 

are finite sets. Then the following are equivalent: 

(i) (Xi, . . . ,X„) is uniformly distributed over S i x • • • x S„; 
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(ii) { Xj } | is mutually independent, with each X, uniformly distributed over 

Sr 

Another immediate consequence of Theorem 8.7 is the following: 

Theorem 8.11. Suppose P is the product distribution P] • • • P„, where each P, 
is a probability distribution on a sample space Q,, so that the sample space of P 
is 12 = Q\ x - • - x Q n . For each i = 1 , let X, be the random variable 
that projects on the ith coordinate, so that X,(®i, . . . ,w n ) = co Then {X;}" =1 is 
mutually independent, and for each i = 1, .... n, the distribution of X, is P,-. 

Theorem 8.11 is often used to synthesize independent random variables “out 
of thin air,” by taking the product of appropriate probability distributions. Other 
arguments may then be used to prove the independence of variables derived from 
these. 

Example 8.24. Theorem 8.11 immediately implies that in Example 8.18, the fam- 
ily of indicator variables {X, }" =1 is mutually independent. □ 

The following theorem gives us yet another way to establish independence. 

Theorem 8.12. Suppose {X,}" =1 is a mutually independent family of random vari- 
ables. Further, suppose that for i = \,...,n, Yj := g,(Xf) for some function g,. 
Then is mutually independent. 

Proof. It suffices to prove the theorem for n = 2. The general case follows easily 
by induction, using Theorem 8.9. For i = 1,2, let i, be any value in the image of 
Yj, and let S' t := g~ l ( {?,}). We have 

P[(^t =h)n(Y 2 = t 2 )] = p[( U (Xi =si))n( U (X 2 = s 2 )) 

S2SS' 2 

= P [ U U ( (Xl = ^l) n (X 2 = ^2)) 

SjG.S'j 

= E E p [(^l = ^l) n (^2 = ^2)] 

s 2 eS' 2 

= E E P[Xi = 5i] P[X * = 52] 

S 2 eS' 2 

= ( E p [*l = *ll)( E p t x 2 = 52]) 

SlGiS^ S2^S f 2 

= P [ U (Xl = 5l) ] P [ U (X 2 = S2) ] = P[/l = ?l]P[y 2 = hl D 

SjG.S'j S2ES2 
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As a special case of the above theorem, if each g, is the characteristic function 

for some subset S' of the image of X,, then Xj e Sj X n e S' n form a mutually 

independent family of events. 

The next theorem is quite handy in proving the independence of random vari- 
ables in a variety of algebraic settings. 

Theorem 8.13. Suppose that G is a finite abelian group, and that 1/1/ is a random 
variable uniformly distributed over G. Let Z be another random variable, tak- 
ing values in some finite set U, and suppose that 1/1/ and Z are independent. Let 
o : U — > G be some function, and define Y := W + g(Z). Then Y is uniformly 
distributed over G, and Y and Z are independent. 

Proof. Consider any fixed values t e G and u e U. Evidently, the events 
(Y = t) n (Z = u) and (1/1/ = t — g(u)) (i (Z = u) are the same, and therefore, 
because 1/1/ and Z are independent, we have 

P[(Y = t) n (Z = u )] = P[W = t - g{u)] P [Z = a] = — P|Z = u\. (8.16) 

|G| 

Since this holds for every u e U, making use of total probability (8.9), we have 
P[Y = t] = 2 p [( y = 0 n (Z = u)] = -i- 2 Piz = u ] = -i-. 

ueU 1 ' ueU 1 1 

Thus, Y is uniformly distributed over G, and by (8.16), Y and Z arc independent. 
(This conclusion could also have been deduced directly from (8.16) using Theo- 
rem 8.6 — we have repeated the argument here.) □ 

Note that in the above theorem, we make no assumption about the distribution 
of Z, or any properties of the function a. 

Example 8.25. Theorem 8.13 may be used to justify the security of the one-time 
pad encryption scheme. Here, the variable 1/1/ represents a random, secret key — 
the “pad” — that is shared between Alice and Bob; U represents a space of possible 
messages; Z represents a “message source,” from which Alice draws her message 
according to some distribution; finally, the function a : U -> G represents some 
invertible “encoding transformation” that maps messages into group elements. 

To encrypt a message drawn from the message source, Alice encodes the mes- 
sage as a group element, and then adds the pad. The variable Y := W + g(Z) 
represents the resulting ciphertext. Since Z = g~ 1 (Y — 1/1/), when Bob receives the 
ciphertext, he decrypts it by subtracting the pad, and converting the resulting group 
element back into a message. Because the message source Z and ciphertext Y are 
independent, an eavesdropping adversary who learns the value of Y does not learn 
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anything about Alice’s message: for any particular ciphertext t, the conditional 
distribution of Z given Y = t is the same as the distribution of Z. 

The term “one time” comes from the fact that a given encryption key should 
be used only once; otherwise, security may be compromised. Indeed, suppose the 
key is used a second time, encrypting a message drawn from a second source Z'. 
The second ciphertext is represented by the random variable Y' := W + cr(Z'). In 
general, the random variables (Z,Z') and (Y, Y') will not be independent, since 
Y — Y' = a (Z) - cj(Z'). To illustrate this more concretely, suppose Z is uniformly 
distributed over a set of 1000 messages, Z’ is uniformly distributed over a set of 
two messages, say, {u' v u' 2 ), and that Z and Z' arc independent. Now, without 
any further information about Z, an adversary would have at best a l-in-a-1000 
chance of guessing its value. However, if he sees that Y = t and Y' = t', for 
particular values t,t' e G, then he has a l-in-2-chance, since the value of Z is 
equally likely to be one of just two messages, namely, u\ := o~ Y {t ~t' + o-(n'j)) and 
ui '■= <x — 1 ( t — t' + (jiip)): more formally, the conditional distribution of Z given 
(Y = t) n (Y r = t') is essentially the uniform distribution on {iq, 1 / 2 } . 

In practice, it is convenient to define the group G to be the group of all bit 
strings of some fixed length, with bit-wise exclusive-or as the group operation. 
The encoding function a simply “serializes” a message as a bit string. □ 

Example 8.26. Theorem 8.13 may also be used to justify a very simple type of 
secret sharing. A colorful, if militaristic, motivating scenario is the following. 
To launch a nuclear missile, two officers who carry special keys must insert their 
keys simultaneously into the “authorization device” (at least, that is how it works in 
Hollywood). In the digital version of this scenario, an authorization device contains 
a secret, digital “launch code,” and each officer holds a digital “share” of this code, 
so that (i) individually, each share reveals no information about the launch code, 
but (ii) collectively, the two shares may be combined in a simple way to derive the 
launch code. Thus, to launch the missile, both officers must input their shares into 
the authorization device; hardware in the authorization device combines the two 
shares, and compares the resulting code against the launch code it stores — if they 
match, the missile flies. 

In the language of Theorem 8.13, the launch code is represented by the random 
variable Z, and the two shares by 1/1/ and Y := W + a(Z), where (as in the previous 
example) a : U -» G is some simple, invertible encoding function. Because 1/1/ and 
Z are independent, information about the share 1/1/ leaks no information about the 
launch code Z; likewise, since Y and Z are independent, information about Y leaks 
no information about Z. However, by combining both shares, the launch code is 
easily constructed as Z = o^ 1 (Y - I/I/). □ 
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Example 8.27. Let k be a positive integer. This example shows how we can take a 
mutually independent family of k random variables, and, from it, construct a much 
larger, k-wise independent family of random variables. 

Let p be a prime, with p > k. Let {/-/,■} k ~ ( ] be a mutually independent fam- 
ily of random variables, each of which is uniformly distributed over Z p . Let us 
set H := (H which, by assumption, is uniformly distributed over 
7jp . For each s e Z p , we define the function p s : Zp -» h p as follows: for 

r = (ro rk-i) e Z p k , p s (r ) := r < s ^ iat * s ’ As( r ) i s the value obtained by 

evaluating the polynomial ro + r\X + ■ ■ ■ + r^_ \X k ~ l e Z p | X | at the point s. 

Each s e Z p defines a random variable p s {H) = Hq + His + ■ ■ ■ + H is* -1 . We 
claim that the family of random variables { p s (H)} s ei. p is k-wise independent, with 
each individual p s (H ) uniformly distributed over Z /; . By Theorem 8.10, it suffices 
to show the following: for all distinct points si,...,Sk e Z p , the random variable 
1/1/ := ( p Sl {H ), . . . , p Sk {H )) is uniformly distributed over Zp^. So let s\,..., be 
fixed, distinct elements of Z p , and define the function 


P : 




r ^ (Psi(r) p Sk (r)). 


(8.17) 


Thus, 1/1/ = p(H). and by Lagrange interpolation (Theorem 7.15), the function p is 
a bijection; moreover, since H is uniformly distributed over Zp*, so is 1/1/. 

Of course, the field Zp may be replaced by an arbitrary finite field. □ 


Example 8.28. Consider again the secret sharing scenario of Example 8.26. Sup- 
pose at the critical moment, one of the officers is missing in action. The military 
planners would perhaps like a more flexible secret sharing scheme; for example, 
perhaps shares of the launch code should be distributed to three officers, in such a 
way that no single officer can authorize a launch, but any two can. More generally, 
for positive integers k and t, with l > k + 1, the scheme should distribute shares 
among t officers, so that no coalition of k (or fewer) officers can authorize a launch, 
yet any coalition of k + 1 officers can. Using the construction of the previous 
example, this is easily achieved, as follows. 

Let us model the secret launch code as a random variable Z, taking values in 
a finite set U. Assume that p is prime, with p > li, and that a : U — »■ Z p is 
a simple, invertible encoding function. To construct the shares, we make use of 
random variables Ho, . . . , H/ ; _| , where each H, is uniformly distributed over Zp, 
and the family of random variables Ho, . . . , H/._| . Z is mutually independent. For 
each s e Z p , we define the random variable 

Y s := H 0 + H lS + ■ ■ ■ + H fc _i/ -1 + a(Z)s k . 
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We can pick any subset S C Z p of size £ that we wish, so that for each s e S, an 
officer gets the secret share Y s (along with the public value s). 

First, we show how any coalition of k + 1 officers can reconstruct the launch code 
from their collection of shares, say, Y Sr , Y Sk+1 . This is easily done by means of 
the Lagrange interpolation formula (again. Theorem 7.15). Indeed, we only need 
to recover the high-order coefficient, <r(Z), which we can obtain via the formula 


fc+t 


»<*> = X 


Y, 


£i IW* - sj) 


Second, we show that no coalition of k officers learn anything about the launch 
code, even if they pool their shares. Formally, this means that if s\,...,Sk are 
fixed, distinct points, then Y Sl ,..., Y Sk , Z form a mutually independent family of 

random variables. This is easily seen, as follows. Define H := (Ho, H^-i), and 

1/1/ := p(H), where p : Z p k -> Z p k is as defined in (8.17), and set Y := (Y S] , . . . , Y Sk ). 
Now, by hypothesis, H and Z are independent, and H is uniformly distributed over 
Z p k . As we noted in Example 8.27, p is a bijection, and hence, 1/1/ is uniformly 
distributed over Z p fe ; moreover (by Theorem 8.12), 1/1/ and Z are independent. 
Observe that Y = 1/1/ + a'(Z), where a' maps u e U to {a(u)s k , . . . , a (w)sjjp e Z p fc , 
and so applying Theorem 8.13 (with the group Z p k , the random variables 1/1/ and 
Z, and the function </), we see that Y and Z are independent, where Y is uniformly 
distributed over Z p k . From this, it follows (using Theorems 8.9 and 8.10) that the 
family of random variables Y Sl , ... , Y Sk , Z is mutually independent, with each Y s . 
uniformly distributed over 7L P . 

Finally, we note that when k = 1, £ = 2, and S = {0, 1}, this construction 
degenerates to the construction in Example 8.26 (with the additive group Z p ). □ 


Exercise 8,12 , Suppose X and X' arc random variables that take values in a set 
S and that have essentially the same distribution. Show that if / : S -> T is a 
function, then / (X) and / (X') have essentially the same distribution. 

Exercise 8,13 , Let { X l } " = } be a family of random variables, and let A, be the 
image of X, for i = 1 Show that {X,- } ” =1 is mutually independent if and only 
if for each i = 2, . . . , n, and for all si e Ai, s t e 5), we have 

R[X, = Si | (Xj = Sl ) n ■ ■ ■ n (X/_! = S/ _!)] = P[X, = s ,]. 

Exercise 8,14 , Suppose that p : G -> G' is a surjective group homomor- 
phism, where G and G' arc finite abelian groups. Show that if g'. It e G' , and 
X and Y arc independent random valuables, where X is uniformly distributed over 
p~ l {{g'}), and Y takes values in p~ l ([h'}), thenX-l- Y is uniformly distributed over 
p~\{g’ + h'}). 
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Exercise 8.15 . Suppose X and Y arc random variables, where X takes values in 

S, and Y takes values in T. Further suppose that Y' is uniformly distributed over 

T, and that (X, Y) and Y' are independent. Let ^ be a predicate on S x T. Show 
that P[0(X, Y)f)(Y = Y')] = P[0(X, Y)]/\T\. 

Exercise 8.16. Let X and Y be independent random variables, where X is uni- 
formly distributed over a set S, and Y is uniformly distributed over a set T C S. 
Define a third random variable Z as follows: if X e T, then Z := X; otherwise, 
Z := Y. Show that Z is uniformly distributed over T. 

Exercise 8.17. Let n be a positive integer, and let X be a random variable, uni- 
formly distributed over {0, — 1 } . For each positive divisor d of n, let us define 
the random variable X,i := X mod d. Show that: 

(a) if d is a divisor of n, then the variable X</ is uniformly distributed over 

{0 </- 1}; 

(b) if d\, . . . , dk are divisors of n. then {X^/. } f =1 is mutually independent if and 
only if {di) k i=l is pairwise relatively prime. 

Exercise 8.18. Suppose X and Y are random variables, each uniformly dis- 
tributed over 7j2, but not necessarily independent. Show that the distribution of 
(X, Y) is the same as the distribution of (X + 1, Y + 1). 

Exercise 8.19. Let I := { 1, . . . , n}, where n > 2, let B := {0, 1 }, and let G be a 
finite abelian group, with |G| > 1. Suppose that {X ( ^ } is a mutually inde- 
pendent family of random variables, each uniformly distributed over G. For each 
/? = (b\, . . . , b n ) e B xn , let us define the random variable Yp := Xi^ + ■ • • + X n b„- 
Show that each Yp is uniformly distributed over G, and that \Yp} p^rv is 3-wise 
independent, but not 4- wise independent. 


8.4 Expectation and variance 

Let P be a probability distribution on a sample space Q. If X is a real-valued 
random variable, then its expected value, or expectation, is 

E[X] := ^ X(ffl) P(ffl). (8.18) 

men 

If S is the image of X, and if for each s e S we group together the terms in (8.18) 
with X(cw) = s, then we see that 

E[X] = P[X = 5 ], 

seS 


(8.19) 
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From (8.19), it is clear that E[X] depends only on the distribution of X : if X' is 
another random variable with the same (or essentially the same) distribution as X, 
then E[X] = E|X'|. 

More generally, suppose X is an arbitrary random variable (not necessarily real 
valued) whose image is S, and / is a real-valued function on S. Then again, if for 
each s e S we group together the terms in (8.18) with X(co) = s, we see that 

E[/(X)] = 2 /(*) PI* = *]■ (8-20) 

seS 

We make a few trivial observations about expectation, which the reader may 
easily verify. First, if X is equal to a constant c (i.e., X(co) = c for every co e Q), 
then E[X] = E[c] = c. Second, if X and Y are random variables such that X > Y 
(i.e., X(co) > Y(a>) for every co e Q), then E[X] > E[F]. Similarly, if X > Y, then 
E[X] > Em. 

In calculating expectations, one rarely makes direct use of (8.18), (8.19), or 
(8.20), except in rather trivial situations. The next two theorems develop tools that 
arc often quite effective in calculating expectations. 

Theorem 8.14 (Linearity of expectation). If X and Y are real-valued random 
variables, and a is a real number, then 

E[X + Y] = E[X] + Em and E [aX\ = a E[X], 

Proof. It is easiest to prove this using the defining equation (8.18) for expectation. 
For co e 12, the value of the random variable X+Y at co is by definition X(o>)+Y(o>), 
and so we have 

E[X + Y] = £( X(co ) + Y{co)) P (co) 

CO 

= 2^(®)P(®)+2 y ^ 

CO CO 

= E[X] + Em. 

For the second paid of the theorem, by a si mi lar calculation, we have 
E[oX] = aX{co )) P(co) = « 2 X(w) = a E[X]> D 

CO CO 

More generally, the above theorem implies (using a simple induction argument) 
that if {X, }/ e j is a finite family of real-valued random variables, then we have 

E [Z*'] = Z E[X ' ] - (8 - 21) 

16 / ( 6 / 

So we see that expectation is linear; however, expectation is not in general mul- 
tiplicative, except in the case of independent random variables: 
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Theorem 8.15. If X and Y are independent, real-valued random variables, then 
E [XY] = E[X] Em. 

Proof. It is easiest to prove this using (8.20), with the function / (s, t) := st applied 
to the random variable (X, Y). We have 

E[XY] = ^ j stP[{X = s)n{Y = t)] 

S,t 

= Y, stP \X = s ] P [Y = t] 

S,t 

= (^sP[X = s]^(j]tP[Y = t^ 

S t 

= E[X]Em. □ 

More generally, the above theorem implies (using a simple induction argument) 
that if {X, } i€ [ is a finite, mutually independent family of real-valued random vari- 
ables, then 

e [ri x '] = n e[x ' ] - t 8 - 22 ) 

i€l i&I 

The following simple facts are also sometimes quite useful in calculating expec- 
tations: 

Theorem 8.16. Let X be a 0/1 -valued random variable. Then E[X] = P[X = 1]. 

Proof. E[X] = 0 • P[X = 0] + 1 • P[X = 1] = P[X = 1], □ 

Theorem 8.17. If X is a random variable that takes only non-negative integer 
values, then 

E[X] = £ P[X > /]. 

;>i 

Note that since X has a finite image, the sum appealing above is finite. 

Proof. Suppose that the image of X is contained in {0 n] , and for / = 1 

let X, be the indicator variable for the event X > i. Then X = X\ + ■ ■ ■ + X„, and 
by lineality of expectation and Theorem 8.16, we have 

n n 

E[X] = 2 E[X,] = '£ J P[X> /]. □ 

/= t (=i 

Let X be a real- valued random variable with p ■= E[X], The variance of X is 
Var[X] := E| (X — g) 2 \. The variance provides a measure of the spread or dispersion 
of the distribution of X around its expected value. Note that since (X - p) 2 takes 
only non-negative values, variance is always non-negative. 
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Theorem 8.18. Let X be a real-valued random variable, with p := E[X], and let a 
and b be real numbers. Then we have 

(1) Var [X] = E[X 2 ] - p 2 , 

(ii) Var[aX] = a 2 Var[X], and 
(Hi) Var [X + b] = Var[X], 

Proof. For part (i), observe that 

Var[X] = E[(X - p) 2 ] = E[X 2 - 2 pX + p 2 ] 

= E[X 2 ] - 2 p E[X] + E[p 2 ] = E[X 2 ] - 2 p 2 + p 2 
= E[X 2 ] - p 2 , 

where in the third equality, we used the fact that expectation is linear, and in the 
fourth equality, we used the fact that E[c] = c for constant c (in this case, c = p 2 ). 
For part (ii), observe that 

Var[nX] = E [n 2 X 2 ] - E[aX] 2 = a 2 E[X 2 ] - ( ap ) 2 
= a 2 (E[X 2 ]- p 2 ) = a 2 Var[X], 

where we used paid (i) in the first and fourth equality, and the linearity of expecta- 
tion in the second. 

Part (iii) follows by a similar calculation: 

Var[X + b\ = E[(X + b ) 2 ] - (p + b ) 2 

= (E[X 2 ] + 2 bp + b 2 ) - (p 2 + 2 bp + b 2 ) 

= E[X 2 ] - p 2 = Var[X], □ 

The following is an immediate consequence of part (i) of Theorem 8.18, and the 
fact that variance is always non-negative: 

Theorem 8.19. If X is a real-valued random variable, then E[X 2 ] > E[X] 2 . 

Unlike expectation, the variance of a sum of random variables is not equal to the 
sum of the variances, unless the variables arc pairwise independent'. 

Theorem 8.20. If { X, } ,- 6 / is a Unite, pairwise independent family of real-valued 
random variables, then 

Var[£x,] = £var[X,]. 

iel i&I 
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Proof. We have 

Var[Xx,]=E[(2x,) 2 ]-( E [Z^]) 2 

16/ 16/ (6/ 

= 2 E[Xf] + 2 (E[X,X,] - E|X,| E\Xj\) - 2 E[X,] 2 

;'e/ ijel iel 

¥J 

(by linearity of expectation and rearranging terms) 

= 2 E [x, 2 ] - Z e [X/] 2 

;e/ 16/ 

(by pairwise independence and Theorem 8.15) 

= £var[X,]. □ 

( 6 / 

Corresponding to Theorem 8.16, we have: 

Theorem 8.21. Let X be a 0/1 -valued random variable, with p := P\X = 1] and 
q:= P[X = 0] = 1 - p. Then Var[X] = pq. 

Proof. We have E[X] = p and E[X 2 ] = P[X 2 = 1] = P[X = 1] = p. Therefore, 
Var[X] = E[X 2 ] - E[X] 2 = p - p 2 = p{\ - p) = pq. □ 


Let B be an event with P[ B \ f 0, and let X be a real- valued random variable. 
We define the conditional expectation of X given B, denoted E[X | B ], to be the 
expected value of the X relative to the conditional distribution P(- | B), so that 

e[x i b]=Yj x (®) p (® i B ) = p^r 1 2 X (®) p (®)- 

CO&S2 coeB 

Analogous to (8.19), if S is the image of X, we have 

E[X\B\ = ^sP[X = s\B]. (8.23) 

seS 

Furthermore, suppose I is a finite index set, and { B , } ieI is a partition of the sample 
space, where each B, occurs with non-zero probability. If for each i e I we group 
together the terms in (8.18) with co e Bj, we obtain the law of total expectation: 

E[X] = 2 E[X I B,\ P[B,]. (8.24) 

( 6 / 

Example 8.29. Let X be uniformly distributed over { 1, . . . , m). Let us compute 
E[X] and Var[X], We have 


E[X] = 2 ^ 

5 = 1 


1 

m 


m{m +1) 1 

2 m 


m + 1 
2 
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We also have 


E[X 2 ] = 2 
Therefore, 


2 1 m(m + 1)(2 m +1) 1 (m + 1)(2 m + 1) 


5=1 


m 


6 


Var[X] = E[X 2 ] - E[X] 2 = 


m 


nr — 1 


12 


□ 


Example 8.30. Let X denote the value of a roll of a die. Let A be the event that X 
is even. Then the conditional distribution of X given A is essentially the uniform 
distribution on {2, 4, 6} , and hence 


E[X | .4] 


2 + 4 + 6 

3 


= 4. 


Similarly, the conditional distribution of X given A is essentially the uniform dis- 
tribution on {1,3,5}, and so 


E[X | A] 


1 + 3 + 5 

3 


= 3. 


Using the law of total expectation, we can compute the expected value of X as 
follows: 


E[X] = E[X | A] P[A] + E[X | .4] P[4] = 4 • i + 3 • i = \ 


which agrees with the calculation in the previous example. □ 


Example 8.31. Let X be a random variable with a binomial distribution, as in 
Example 8.18, that counts the number of successes among n Bernoulli trials, each 
of which succeeds with probability p. Let us compute E[X] and Var[X], We can 
write X as the sum of indicator variables, X = £” =1 X/, where X, is the indicator 
variable for the event that the z'th trial succeeds; each X, takes the value 1 with 
probability p and 0 with probability q := 1 — p, and the family of random variables 
{ X, } '! [ is mutually independent (see Example 8.24). By Theorems 8.16 and 8.21, 
we have E[X, ] = p and Var[X, ] = pq for i = 1 ,...,«. By linearity of expectation, 
we have 

n 

E[X] = 2 E|X, | = np. 

(=t 

By Theorem 8.20, and the fact that {X, } ” =1 is mutually independent (and hence 
pairwise independent), we have 

n 

Var[X] = 2 VarfX,-] = npq. □ 

/= t 
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Example 8.32. Our proof of Theorem 8. 1 could be elegantly recast in terms of 
indicator variables. For B C £2, let X# be the indicator variable for B, so that 
Xb{co) = 5 m [B] for each co e £2. Equation (8.8) then becomes 

Xa = E (-V lJhl XAj, 

0CJCI 

and by Theorem 8.16 and lineality of expectation, we have 

p [A] = E[X A ] = 2 (- 1 )' 7 ' -1 EPCtJ = E (- 1 ) |J|_1 p [X Aj ]. □ 

0CJCI 0CJCI 

Exercise 8.20 . Suppose X is a real-valued random variable. Show that |E[X]| < 
E[|X|] < EfX 2 ] 1 / 2 . 

Exercise 8.21 . Suppose X and Y take non-negative real values, and that Y < c 
for some constant c. Show that E [XT] < c E[X] 

Exercise 8.22 . Let X be a 0/1 -valued random variable. Show that Var[X] < 1/4. 

Exercise 8.23 . Let B be an event with P \B\ ^ 0, and let {73,}, 6 / be a finite, 
pairwise disjoint family of events whose union is B. Generalizing the law of 
total expectation (8.24), show that for every real-valued random variable X, if 
/*:={/£/: P [Bj] 0}, then we have 

E[X | B] P [B] = 2 E[X | B;\ P [B t \. 

i&I* 

Also show that if E[X | 73, | < a for each i e I*, then E[X | B] < a. 

Exercise 8.24 . Let B be an event with P \B\ ^ 0, and let {C,} !e j be a finite, 
pairwise disjoint family of events whose union contains B. Again, generalizing 
the law of total expectation, show that for every real-valued random variable X, if 
/*:={/£/: P [B n C;] ^ 0}, then we have 

E[X | B] = 2 E[X | B n C t ] P [C, | B\. 

iel* 

Exercise 8.25 . This exercise makes use of the notion of convexity (see §A8). 

(a) Prove Jensen’s inequality: if / is convex on an interval, and X is a random 
variable taking values in that interval, then E[/(X)] > /(E[Xj). Hint: use 
induction on the size of the image of X. (Note that Theorem 8. 19 is a special 
case of this, with f(s ) := s 2 .) 

(b) Using part (a), show that if X takes non-negative real values, and a is a 
positive number, then E[X“] > E[X]“ if a > 1, and E[X“] < E[X]“ if 
a < 1. 
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(c) Using part (a), show that if X takes positive real values, then E[X] > e E liogX]_ 

(d) Using part (c), derive the arithmetic/geometric mean inequality: for all 
positive numbers xi, . . . , x„, we have 

(xi H + x„)/n > (xi • ■ ■ X,,) 1 /". 


Exercise 8.26 . For real-valued random variables X and Y, their covariance is 
defined as Cov[X, Y] := E\XY\ - E[X] E [Y], Show that: 

(a) if X, Y, and Z arc real-valued random variables, and a is a real number, then 
Cov[X + Y,Z\ = Cov[X,Z] + Cov[T,Z] and Cov[nX,Z] = aCov[X,Z]; 

(b) if {X, } , 6 / is a finite family of real-valued random variables, then 


Var[£X/ 

iel 


2 Var[X,] + 2 Co V |X ( . X 7 |, 
iel ijel 

¥J 


Exercise 8.27. Let / : [0, 1] — »• M be a function that is “nice” in the following 
sense: for some constant c, we have | / ( s ) - /(f) \ < c|s - 1\ for all s,t e [0, 1], This 
condition is implied, for example, by the assumption that / has a derivative that 
is bounded in absolute value by c on the interval [0,1]. For each positive integer 
n, define the polynomial B n j := X*=o (£)/(k/«)T fc ( 1 - T) n ~ k e M[T], Show 
that \ B n j(p) — f{p ) | < c /2\fn for all positive integers n and all p e [0, 1], Hint: 
let X be a random variable with a binomial distribution that counts the number of 
successes among n Bernoulli trials, each of which succeeds with probability p, and 
begin by observing that B n j(p) = E| /'(X/n)|. The polynomial B n j is called the 
nth Bernstein approximation to /, and this result proves a classical result that 
any “nice” function can approximated to arbitrary precision by a polynomial of 
sufficiently high degree. 

Exercise 8.28. Consider again the game played between Alice and Bob in 
Example 8.11. Suppose that to play the game. Bob must place a one dollar bet. 
However, after Alice reveals the sum of the two dice. Bob may elect to double his 
bet. If Bob’s guess is correct, Alice pays him his bet, and otherwise Bob pays Alice 
his bet. Describe an optimal playing strategy for Bob, and calculate his expected 
winnings. 

Exercise 8.29. A die is rolled repeatedly until it comes up “1,” or until it is rolled 
n times (whichever comes first). What is the expected number of rolls of the die? 
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In this section, we present several theorems that can be used to bound the prob- 
ability that a random variable deviates from its expected value by some specified 
amount. 


Theorem 8.22 (Markov’s inequality). Let X be a random variable that takes only 
non-negative real values. Then for every a > 0 , we have 

P[X > a] < E[X]/a. 


Proof. We have 

E[X] = ^sP[X = s] = ^sP[X = s]+'£ j sP[X = si 

s s<a s>a 

where the summations are over elements s in the image of X. Since X takes only 
non-negative values, all of the terms arc non-negative. Therefore, 

E[X] > 2]sP[X = s] > 2«P[X = s] = ocP[X>a]. □ 

s>a s>a 

Markov’s inequality may be the only game in town when nothing more about 
the distribution of X is known besides its expected value. However, if the variance 
of X is also known, then one can get a better bound. 


Theorem 8.23 (Chebyshev’s inequality). Let X be a real-valued random variable, 
with g := E[X] and v := Var[X]. Then for every a > 0 , we have 

P[|X - g\ > a] < v/a 2 . 

Proof. Let Y := (X — g) 2 . Then Y is always non-negative, and E[T] = v. Applying 
Markov’s inequality to Y. we have 

P[|X - g\ > o] = P[Y > a 2 ] < v/a 2 . □ 


An important special case of Chebyshev’s inequality is the following. Suppose 
that {X;}; 6 j is a finite, non-empty, pairwise independent family of real-valued ran- 
dom variables, each with the same distribution. Let g be the common value of 
E[X,], v be the common value of Var[X,], and n := |/|. Set 


X := 


1 

n 


lx. 


The variable X is called the sample mean of {X, }, 6 /. By the lineality of expecta- 
tion, we have E[X] = g, and since {X, }, e / is pairwise independent, it follows from 
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Theorem 8.20 (along with part (ii) of Theorem 8.18) that Var[X] = v/n. Applying 
Chebyshev’s inequality, for every e > 0, we have 

P[|X - p\ > e] < (8.25) 

ne~ 

The inequality (8.25) says that for all e > 0, and for all S > 0, there exists no 
(depending on e and 5, as well as the variance v) such that n > «o implies 

P[|X - p\ > e] < <5. (8.26) 

In words: 

As n gets large, the sample mean closely approximates the expected 
value p with high probability. 

This fact, known as the law of large numbers, justifies the usual intuitive interpre- 
tation given to expectation. 

Let us now examine an even more specialized case of the above situation, where 
each X, is a 0/1 -valued random variable, taking the value 1 with probability p, and 
0 with probability q := I — p. By Theorems 8.16 and 8.21, the X,’s have a common 
expected value p and variance pq. Therefore, by (8.25), for every e > 0, we have 

P[\X-p\>£]<^. (8.27) 

ne z 

The bound on the right-hand side of (8.27) decreases linearly in n. If one makes 
the stronger assumption that the family { X, } ie / is mutually independent (so that 
X := 2,X, has a binomial distribution), one can obtain a much better bound that 
decreases exponentially in n: 

Theorem 8.24 (Chernoff bound). Let {X,}, e j be a finite, non-empty, and mutu- 
ally independent family of random variables, such that each X, is 1 with probability 
p and 0 with probability q := 1 — p. Assume that 0 < p < 1. Also, let n := |/| and 
X be the sample mean of {X ; }/ e /. Then for every e > 0, we have: 

(i) P[X-p>e\ < e -« £2 /2? ; 

(ii) P[X-p<-e]< e~ n£2/2p : 

(Hi) P[|X - p\ > e] < 2e~ n£2/2 . 

Proof. First, we observe that (ii) follows directly from (i) by replacing X, by 1 - X, 
and exchanging the roles of p and q. Second, we observe that (iii) follows directly 
from (i) and (ii). Thus, it suffices to prove (i). 

Let a > 0 be a parameter, whose value will be determined later. Define the 
random variable Z := e nn(X ~i’( Since the function x i-> e anx is strictly increasing, 
we have X - p > e if and only if Z > e an£ . By Markov’s inequality, it follows that 


P[X - p > e] = P[Z > e an£ ] < E [Z]e~ an£ 


(8.28) 
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So our goal is to bound E[Z] from above. 

For each i e I, define the random variable Z, := e n(X, ~ p) . Observe that 
Z = n ,6/Z/. that {Zj}i g/ is a mutually independent family of random variables 
(see Theorem 8.12), and that for each i e I, we have 

E|Z, | = e a{l ~ p) p + e a( °- p) q = pe aq + q e - ap . 


It follows that 

E[Z] = E [JJZ,-] = ll E[Z,] = (pe aq + qe~ ap ) n . 

i&I iel 


We will prove below that 


pe aq + qe~ ap < e^ 2 . 

(8.29) 

From this, it follows that 


E[Z] < e a2qn/2 . 

(8.30) 

Combining (8.30) with (8.28), we obtain 


P[X - p > e] < 

(8.31) 


Now we choose the parameter a so as to minimize the quantity a 2 qn/2 — ane. The 
optimal value of a is easily seen to be a = e/q, and substituting this value of a into 
(8.31) yields (i). 

To finish the proof of the theorem, it remains to prove the inequality (8.29). Let 

ft := pe aci + qe~ ap . 

We want to show that ft < e a ~ q ! 2 , or equivalently, that log ft < a 2 q/2. We have 
ft = e aq (p + qe~ a ) = e aq {\ - q( 1 - 

and taking logarithms and applying parts (i) and (ii) of §A1, we obtain 

log ft = aq + log(l - < 7(1 - e~“)) < aq - q{ 1 - e ~ a ) = q{e~ a + a - 1) < qa 2 /2. 

This establishes (8.29) and completes the proof of the theorem. □ 

Thus, the Chernoff bound is a quantitatively superior version of the law of large 
numbers, although its range of application is clearly more limited. 

Example 8.33. Suppose we toss a fair coin 10,000 times. The expected number 
of heads is 5,000. What is an upper bound on the probability a that we get 6,000 
or more heads ? Using Markov’s inequality, we get a < 5/6. Using Chebyshev’s 
inequality, and in particular, the inequality (8.27), we get 

1/4 1 

“ “ 10 4 10- 2 ~ 400 
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Finally, using the Chernoff bound, we obtain 

a < tO -2 / 2(0.5) = e -100 * 10 -43.4_ n 

Exercise 8.30. With notation and assumptions as in Theorem 8.24, and with 
p := q := 1 / 2 , show that there exist constants c\ and C 2 such that 

P[|X- 1 / 2 | > ci/Vn] < 1/2 and P[|X - 1 / 2 | > c 2 /Vn\ > 1 / 2 . 

Hint: for the second inequality, use Exercise 5.16. 

Exercise 8.31. In each step of a random walk, we toss a coin, and move either 
one unit to the right, or one unit to the left, depending on the outcome of the 
coin toss. The question is, after n steps, what is our expected distance from the 
starting point? Let us model this using a mutually independent family of ran- 
dom variables { Y , } ” =1 , with each Y, uniformly distributed over {-1,1}, and define 
Y := Y\ + ■ ■ ■ + Y n . Show that the ci sfn < E[|T|] < c 2 s/n, for some constants c\ 
and c 2 . 

Exercise 8.32. The goal of this exercise is to prove that with probability very 
close to 1 , a random number between 1 and m has very close to log log m prime 
factors. To prove this result, you will need to use appropriate theorems from Chap- 
ter 5. Suppose N is a random variable that is uniformly distributed over { 1, . . . , m}, 
where m > 3. For i = 1, . . . , m, let D, be the indicator variable for the event that i 
divides N. Also, define X := Yj P <m Dp, where the sum is over all primes p < m, so 
that X counts the number of distinct primes dividing N. Show that: 

(a) 1 / / - 1 / m < E[D,] < 1//, for each / = 1, . . . , nr, 

(b) |E[X] - log log m\ < c\ for some constant c \ ; 

(c) for all primes p, q, where p < m, q < m, and p ^ q, we have 

1/1 1 \ 

CovfDp, Dq] < — ( - 4 — 1, 
m\p q / 

where Cov is the covariance, as defined in Exercise 8.26; 

(d) Var[X] < log log m + c 2 for some constant C 2 ; 

(e) for some constant 03 , and for every a > 1 , we have 

P \X — loglogn?| > afloglog m ) 1//2 < a ~ 2 (^ 1 + C 3 (k>gk>g j. 

Exercise 8.33. For each positive integer n, let r(n) denote the number of positive 
divisors of n. Suppose that N is uniformly distributed over { 1, . . . , m ) . Show that 
E[r(/V)] = log m + 0(1). 
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Exercise 8.34. You are given three biased coins, where for / = 1,2,3, coin i 
comes up heads with probability p,. The coins look identical, and all you know is 
the following: (1) \p\ - pi\ > 0.01 and (2) either pi, = p\ or p^ = pi- Your goal 
is to determine whether pi, is equal to p \ , or to pi. Design a random experiment 
to determine this. The experiment may produce an incorrect result, but this should 
happen with probability at most 10~ 12 . Try to use a reasonable number of coin 
tosses. 

Exercise 8.35. Consider the following game, parameterized by a positive integer 
n. One rolls a pair of dice, and records the value of their sum. This is repeated until 
some value l is recorded n times, and this value l is declared the “winner.” It is 
intuitively clear that 7 is the most likely winner. Let a„ be the probability that 7 
does not win. Give a careful argument that a n — > 0 as n -» oo. Assume that the 
rolls of the dice arc mutually independent. 


8.6 Balls and bins 

This section and the next discuss applications of the theory developed so fair 
Our first application is a brief study of “balls and bins.” Suppose you throw n 
balls into m bins. A number of questions naturally arise, such as: 

• What is the probability that a collision occurs, that is, two balls land in the 
same bin? 

• What is the expected value of the maximum number of balls that land in 
any one bin? 

To formalize these questions, we introduce some notation that will be used 
throughout this section. Let I be a finite set of size n > 0, and S a finite set 
of size m > 0. Let {X, }, 6 / be a family of random variables, where each X, is 
uniformly distributed over the set S. The idea is that I represents a set of labels 
for our n balls, S represents the set of m bins, and X, represents the bin into which 
ball i lands. 

We define C to be the event that a collision occurs; formally, this is the event that 
X, = Xj for some i, j e I with i ± j. We also define M to be the random variable 
that measures that maximum number of balls in any one bin; formally, 

M := max{A/ s : s e A}, 

where for each s e S. N s is the number of balls that land in bin s\ that is, 

N s := \ {i e I : X, = s}|. 

The questions posed above can now be stated as the problems of estimating P[C] 
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and E[M]. However, to estimate these quantities, we have to make some assump- 
tions about the independence of the X,’s. While it is natural to assume that the 
family of random variables {X/},- e j is mutually independent, it is also interesting 
and useful to estimate these quantities under weaker independence assumptions. 
We shall therefore begin with an analysis under the weaker assumption that { X, } i€l 
is pairwise independent. We start with a simple observation: 


Theorem 8.25. Suppose {X,} i€ j is pairwise independent. Then for all i,j e I 
with i j, we have P[X, = Xj\ = 1/m. 

Proof. The event X, = Xj occurs if and only if X, = s and Xj = s for some s e S. 
Therefore, 

P[X, = Xj] = J] P[(X, = s) n (Xj = s)] (by Boole’s equality (8.7)) 

seS 

= ^ 1 /nr (by pairwise independence) 

= \/m. □ 


Theorem 8.26. Suppose {X,}, e / is pairwise independent. Then 


P[C] < 


n{n — 1) 
2m 


Proof. Let / (2) := [J C I : |/| = 2}. Then using Boole’s inequality (8.6) and 
Theorem 8.25, we have 


p[C] < y p[x , = = 

{i,j}£lV 


h,j}elW 


1 

m 


|/ (2) | 


n{n — 1) 
2 m 


Theorem 8.27. Suppose {X,}, e / is pairwise independent. Then 

E[M] < y n 2 /m + n. 

Proof. To prove this, we use the fact that E[M] 2 < E[M 2 ] (see Theorem 8.19), and 
that M 2 < Z := ^ seS /V 2 . It will therefore suffice to show that 

E[Z] < n 2 /m + n. (8.32) 


To this end, for i e I and s e S, let L is be the indicator variable for the event that 
ball i lands in bin s (i.e., X, = 5 ), and for i,j e I, let Cy be the indicator variable 
for the event that balls i and j land in the same bin (i.e., X, = Xf). Observing that 
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C U = Uses l -is L js, we have 

z = Z N > = Z ( Z '■“)’■ = E( Z ( Z 4-.) = Z Z L » L i> 

5g5 5g5 ig/ sg5 ig/ ye/ (J6/ 565 

= Z C 'V 

ijeJ 

For /,/ e I, we have E | C, 7 ] = P[X, = Xj] (see Theorem 8.16), and so by The- 
orem 8.25, we have E[C/j] = 1/m if / ^ j, and clearly, E[C, 7 ] = 1 if i = j. By 
linearity of expectation, we have 

E[Z] = J] E [C u ] = J] E [Cy] + 2 E[C„] = ^ ~ l) + n < n 2 /m + n, 

ijel ije I iel 

¥J 

which proves (8.32). □ 

We next consider the situation where { X, } , e / is mutually independent. Of 
course. Theorem 8.26 is still valid in this case, but with our stronger assumption, 
we can derive a tower bound on P[C]. 


Theorem 8.28. Suppose {X, } , 6 j is mutually independent. Then 

p[C] > i - e - n(n ~ l) / 2m . 


Proof. Let a := P[C]. We want to show a < Wc may assume that 

I = { I } (the labels make no difference) and that n < m (otherwise, a = 0). 

Under the hypothesis of the theorem, the random variable (X\ X n ) is uniformly 

distributed over S xn . Among all m n sequences (si, e S xn , there arc a total 

of m(m — 1) • ■ ■ (m — n + 1) that contain no repetitions: there are m choices for si, 
and for any fixed value of si, there arc m — I choices for si. and so on. Therefore 


a = m(m — 1) • • • (m — n + \)/m" 


(l-i) 

(>--)• 

.-(I-"” *) 

V m / 

V m / 

V m / 


Using part (i) of §A1, we obtain 

a < = e- n(n ~ l) / 2m . □ 


Theorem 8.26 implies that if «(« — 1) < m, then the probability of a collision is 
at most 1/2; moreover. Theorem 8.28 implies that if n{n — 1) > (21og2)m, then 
the probability of a collision is at least 1/2. Thus, for n near \frn. the probability 
of a collision is roughly 1 /2. A colorful illustration of this is the following fact: in 
a room with 23 or more people, the odds arc better than even that two people in the 
room have birthdays on the same day of the year. This follows by setting n = 23 
and m = 365 in Theorem 8.28. Here, we arc ignoring leap years, and the fact that 
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birthdays are not uniformly distributed over the calendar year (however, any skew 
in the birthday distribution only increases the odds that two people share the same 
birthday — see Exercise 8.40 below). Because of this fact. Theorem 8.28 is often 
called the birthday paradox (the “paradox” being the perhaps surprisingly small 
number of people in the room). 

The hypothesis that {X t } i€ j is mutually independent is crucial in Theorem 8.28. 
Indeed, assuming just pairwise independence, we may have P[C] = 1/m, even 
when n = m (see Exercise 8.42 below). However, useful, non-trivial lower bounds 
on P[C] can still be obtained under assumptions weaker than mutual independence 
(see Exercise 8.43 below). 

Assuming {X,}, 6 / is mutually independent, we can get a much sharper upper 
bound on E[M] than that provided by Theorem 8.27. For simplicity, we only 
consider the case where m = n\ in this case, Theorem 8.27 gives us the bound 
E[M] < V2n (which cannot be substantially improved assuming only pairwise 
independence — see Exercise 8.44 below). 


Theorem 8.29. Suppose {X, };<=/ is mutually independent and that m = n. Then 


E[M] < (1 + 0(1)) 


log n 
log log n 


Proof. We use Theorem 8.17, which says that E[M] = X/t>i P[M > k]. 

Claim 1. For k > 1, we have P[M > k] < n/k\. 

To prove Claim 1, we may assume that k < n (as otherwise, P[M > k\ = 0). 
Let := {J C I : \J\ = k} . Now, M > k if and only if there is an s e S and a 
subset J e f (k> , such that Xj = s for all j e J . Therefore, 


P [M > k] < 1 1 = 


(by Boole’s inequality (8.6)) 


S6X Je/rt) jeJ 

y y P | = .v | (by mutual independence) 

se S Jel ,k > i&J 

n(\n~ k < n/k\. 


That proves Claim 1. 

Of course. Claim 1 is only interesting when n/k ! < 1, since P[M > k] is always 
at most 1. Define F{n) to be the smallest positive integer k such that k\> n. 

Claim 2. F{n) r*j log n/ log log n. 

To prove this, let us set k := F{n). It is clear that n < k\ < nk, and taking 
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logarithms, log n < log k ! < log n + log k. Moreover, we have 

k rk 


log /c! = = 


e = 1 


log x dx + 0(log k) = k log k - k + 0(log k) r>j k log k. 


where we have estimated the sum by an integral (see §A5). Thus, 
log n = log k ! + 0(log k) r*j k log k. 
Taking logarithms again, we see that 


log log n = log k + log log k + o(l) log k, 

and so log n ~ k log k ~ k log log n, from which Claim 2 follows. 

Finally, observe that each term in the sequence [n/kl}^ =l is at most half the 
previous term. Combining this observation with Claims 1 and 2, and the fact that 
P[M > k] is always at most 1, we have 

E[M] = 2 p [M>k]= 2 P[M > k] + Yj p t M ^ *] 

k> 1 k<F(n) k>F(n) 

< F(n) + ^ 2~ e = F(n) + 1 ~ log n / log log n. □ 

l> t 


Exercise 8.36 . Let a\,...,a m be real numbers that sum to 1. Show that 0 < 
X”Li(«s “ 1/m) 2 = ^"Li «s ~ 1 l m -> and in particular, X"Li a l > 1/ m . 

Exercise 8.37 . Let X and X' be independent random variables, both having the 
same distribution on a set S of size m. Show that P[X = X'] = Xsev = — 

1 /m. 

Exercise 8.38. Suppose that the family of random variables X, Y, Y' is mutually 
independent, where X has image S, and where Y and Y' have the same distribution 
on a set T. Let cp be a predicate on S x T, and let a := P [</>(X, T)]. Show that 
P[(p{X, Y) n (p(X. Y')\ > a 2 . In addition, show that if Y and Y' arc both uniformly 
distributed over T, then P[0(X, Y) n cp(X , Y') n (Y / V")] > a 2 - a/\T\. 

Exercise 8.39. Let a \, . . . , a m be non-negative real numbers that sum to 1. Let 
S := {1, .... m}, and for n = 1, . . . , m, let S (n) := {T C S : \T\ = «}, and define 

P n (ai,...,a m ) := ^ II a '- 

Te5W reT 

Show that P n (a\ a m ) is maximized when a\ = ■ ■ ■ = a m = I / m. Hint: first 

argue that if a s < a t . then for every e e [0, a t — a s | , replacing the pair (a s , a t ) by 
( a s + e, a t — e) does not decrease the value of P n {a \, . . . , a m ). 
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Exercise 8.40. Suppose that {X,-},- e j is a finite, non-empty, mutually independ- 
ent family of random variables, where each X, is uniformly distributed over a finite 
set S. Suppose that {Yj} ie j is another finite, non-empty, mutually independent 
family of random variables, where each Y, has the same distribution and takes 
values in the set S. Let a be the probability that the X,’s are distinct, and P be the 
probability that the Yp s are distinct. Using the previous exercise, show that p < a. 

Exercise 8.41. Suppose n balls are thrown into m bins. Let A be the event that 
there is some bin that is empty. Assuming that the throws are mutually independent, 
and that n > m(log m + t) for some t > 0, show that P[^l] < e~' . 

Exercise 8.42. Show that for every prime p, there exists a pairwise independent 
family of random variables {X, }, e ^, where each X, is uniformly distributed over 
Z p , and yet the probability that all the X,’s arc distinct is 1 - 1/p. 

Exercise 8.43. Let {X, }" =1 be a finite, non-empty, 4-wise independent family of 
random variables, each uniformly distributed over a set S. Let a be the probability 
that the X, ’s arc distinct. For i,j = 1, . . . , n, let C, 7 be the indicator variable for the 
event that X, = X 7 , and define K := {(i,j) '■ 1 < / < n — 1, i + 1 < j < n} and 
Z := Zaj)eK C tJ . Show that: 

(a) { Cjj } aj)eK is pairwise independent; 

(b) E[Z] = n(n - 1) /2m and Var[Z] = (1 - 1 /m) E[Z]; 

(c) a < 1/ E[Z] ; 

(d) a < 1/2, provided n{n — 1) > 2 m (hint: Exercise 8.4). 

Exercise 8.44. Let k be a positive integer, let n := k 2 — k + 1, let I and S be sets 
of size n, and let so be a fixed element of S. Also, let I ^ := {J C I : |/| = k}, 
and let II be the set of all permutations on S. For each J e I {k \ let fj be some 
function that maps J to .vq, and maps I \ J injectively into S \ {so}- For iell, 
J e I (k \ and i e I, define pj{rc,J) := n{fj{i)). Finally, let Y be uniformly 
distributed over II x I^ k \ and for i e /, define X, := ppY). Show that {X,{ (6 / 
is pairwise independent, with each X, uniformly distributed over S, and yet the 
number of X, 's with the same value is always at least sfn. 

Exercise 8.45. Let A be a set of size m > 1, and let so be an arbitrary, fixed 
element of S. Let F be a random variable that is uniformly distributed over the 
set of all m'" functions from S into S. Let us define random variables X,, for 
i = 0, 1 , 2 , . . . , as follows: 

X 0 :=s 0 , X/+i :=F(X,) (i = 0, 1,2,...). 

Thus, the value of X, is obtained by applying the function F a total of i times to the 
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starting value .so. Since S has size m, the sequence {X,-}“ 0 must repeat at some 
point; that is, there exists a positive integer n (with n < m) such that X„ = X, for 
some /' = 0 n — 1 . Define the random variable Y to be the smallest such value 


(a) Show that for every i > 0 and for all ,V| , . . . , s, e S such that .so, .s i , . . . , .s,- 
are distinct, the conditional distribution of X l+ \ given the event (X\ = .s | ) n 
■ ■ ■ n (Xj = s,) is the uniform distribution on S. 

(b) Show that for every integer n > 1 , we have Y > n if and only if the random 

variables Xo,Xi, X„_i take on distinct values. 

(c) From parts (a) and (b), show that for each n = 1, . . . , m, we have 

P[Y > n | Y > n - 1] = 1 - (n - 1 )/m, 

and conclude that 

n— 1 

P [Y > n\ = JJ(1 - i/m) < e -»(»-D/2m_ 

i= 1 

(d) Using part (c), show that 

E [Y] = 2 P[F > «] < 2 e“" ( " _1)/2m = 0(m l/2 ). 

n > 1 n > 1 

(e) Modify the above argument to show that E[F] = £l(m^ 2 ). 


Exercise 8.46. The setup for this exercise is identical to that of the previous 
exercise, except that now, F is uniformly distributed over the set of all ml permuta- 
tions of S. 


(a) Show that if Y = n. then X n = Xq. 

(b) Show that for every i > 0 and all ,S| , . . . , s, e S such that .so, .sg , . . . , s, arc 
distinct, the conditional distribution of X, + i given {X\ = si)n- • -n(X, = s t ) 
is essentially the uniform distribution on S \ { .v j , . . . , .v, } . 

(c) Show that for each n = 2 m, we have 

1 

P[Y >n\Y>n-l] = l -, 

m — n + 2 

and conclude that for all n = 1, . . . , m, we have 


n - 2 j 

P[T > n] = TT ( 1 :) 

A1 \ m — i / 
/= 0 


= 1 - 


n — 1 
m 


(d) From part (c), show that Y is uniformly distributed over { 1, . . . , m], and in 
particular - , E[T] = (m + l)/2. 
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8.7 Hash functions 

In this section, we apply the tools we have developed thus far to a particularly 
important area of computer science: the theory and practice of hashing. 

Let R, S, and T be finite, non-empty sets. Suppose that for each r e R, we have 
a function <f>,. : S -» T. We call <ty a hash function (from S to T). Elements of 
R arc called keys, and if = t, we say that s hashes to t under r. 

In applications of hash functions, we arc typically interested in what happens 
when various inputs arc hashed under a randomly chosen key. To model such 
situations, let H be a random variable that is uniformly distributed over R, and for 
each s € S, let us define the random variable ®h(s), which takes the value O r (s) 
when H = r. 

• We say that the family of hash functions { O r } re r is pairwise independent 
if the family of random variables {0 H (.v)} iV6 ,s' is pairwise independent, with 
each Oh (s) uniformly distributed over T. 

• We say that {O r },.<=# is universal if 

p[o H (s) = o H (s')] < i/m 

for all s, s' e S with s ^ s'. 

We make a couple of simple observations. First, by Theorem 8.25, if the family 
of hash functions {O,. } re « is pairwise independent, then it is universal. Second, by 
Theorem 8.10, if |.S'| > 1, then {O,. } ,. £ r is pairwise independent if and only if the 
following condition holds: 

the random variable (Oh(s), Oh(s')) is uniformly distributed over 
T x T, for all s,s' e S with s ^ s'\ 

or equivalently, 

P[O h (s) = t n O H (s') = t’\ = 1 /\T\ 2 for all s,s' e S with 5 ± s’, 
and for all t, t 1 e T. 

Before looking at constructions of pairwise independent and universal families 
of hash functions, we briefly discuss two important applications. 

Example 8.34. Suppose {O,- },■€/< is a universal family of hash functions from S 
to T. One can implement a “dictionary” using a so-called hash table, which is 
basically an array A indexed by T, where each entry in A is a list. Entries in the 
dictionary are drawn from the set S. To insert a word s € S into the dictionary, s 
is first hashed to an index t, and then s is appended to the list /l|r|; likewise, to see 
if an arbitrary word .v e S is in the dictionary, s is first hashed to an index t, and 
then the list A[i] is searched for ,v. 

Usually, the set of entries in the dictionary is much smaller than the set S. For 
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example, S may consist of all bit strings of length up to, say 2048, but the dic- 
tionary may contain just a few thousand, or a few million, entries. Also, to be 
practical, the set T should not be too large. 

Of course, all entries in the dictionary could end up hashing to the same index, 
in which case, looking up a word in the dictionary degenerates into linear search. 
However, we hope that this does not happen, and that entries hash to indices that 
arc nicely spread out over T. As we will now see, in order to ensure reasonable 
performance (in an expected sense), T needs to be of size roughly equal to the 
number of entries in the dictionary, 

Suppose we create a dictionary containing n entries. Let m := |7j. and let / C S 
be the set of entries (so n = |/|). These n entries arc inserted into the hash table 
using a randomly chosen hash key, which we model as a random variable H that 
is uniformly distributed over R. For each s e S, we define the random variable 
L s to be the number of entries in I that hash to the same index as s under the key 
H; that is, L s := \{i e I : ®h(s) = $h( 0}|- Intuitively, L s measures the cost of 
looking up the particular - word s in the dictionary. We want to bound E[L S ], To this 
end, we write L s as a sum of indicator variables: L s = X,- e j C SI , where C si is the 
indicator variable for the event that <J>h(s) = Oh ( 4)- By Theorem 8.16, we have 
E[C S( ] = P[Oh(s) = O H (01; moreover, by the universal property, E [ C sv ] < 1/m if 
s f i, and clearly, E[ C S j | = 1 if s = By linearity of expectation, we have 

E[L S ] = E[C S ,]. 

( 6 / 

If s ^ I, then each term in the sum is < 1/m, and so E[L S ] < n/m. If s e I, 
then one term in the sum is 1, and the other n — 1 terms are < 1/m, and so 
E [L s ] < 1 + (n— 1 )/m. In any case, we have 

E[L S ] < 1 + n/m. 

In particular - , this means that if m > n, then the expected cost of looking up any 
particular - word in the dictionary is bounded by a constant. □ 

Example 8.35. Suppose Alice wants to send a message to Bob in such a way that 
Bob can be reasonably sure that the message he receives really came from Alice, 
and was not modified in transit by some malicious adversary. We present a solution 
to this problem here that works assuming that Alice and Bob share a randomly 
generated secret key, and that this key is used to authenticate just a single message 
(multiple messages can be authenticated using multiple keys). 

Suppose that { O, j ,- e r is a pairwise independent family of hash functions from 
S to T. We model the shared random key as a random variable H, uniformly 
distributed over R. We also model Alice’s message as a random variable X , taking 
values in the set S. We make no assumption about the distribution of X , but we do 
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assume that X and H arc independent. When Alice sends the message X to Bob, 
she also sends the “authentication tag” Y := ®h(A). Now, when Bob receives a 
message X' and tag Y', he checks that 0 H (X') = V"; if this holds, he accepts the 
message X' as authentic; otherwise, he rejects it. Here, X' and Y' arc also random 
variables; however, they may have been created by a malicious adversary who may 
have even created them after seeing X and Y. We can model such an adversary as 
a pair of functions / : S x T -> S and g : S x T —> T, so that X' := /(X, Y) and 
Y' := g(X, Y). The idea is that after seeing X and Y, the adversary computes X' and 
Y’ and sends X’ and Y' to Bob instead of X and Y . Let us say that the adversary 
fools Bob if Oh(X') = Y' and X' f X. We will show that P[ F\ < 1 /m, where T is 
the event that the adversary fools Bob, and m := |7j. Intuitively, this bound holds 
because the pairwise independence property guarantees that after seeing the value 
of Oh at one input, the value of Oh at any other input is completely unpredictable, 
and cannot be guessed with probability any better than 1/m. If m is chosen to be 
suitably large, the probability that Bob gets fooled can be made acceptably small. 
For example, S may consist of all bit strings of length up to, say, 2048, while the set 
T may be encoded using much shorter bit strings, of length, say, 64. This is nice, 
as it means that the authentication tags consume very little additional bandwidth. 

A straightforward calculation justifies the claim that P[ F] < 1/m: 


P[F] = 12>l (X = s) n (Y = t) n F 


seS ieT 


(law of total probability (8.9)) 


seS teT 


= ZZ P F = s) n = t)r i (®h(/(a 0) = g(s, 0) n 

f s)l 


= E E p 


seS t&T 


(<&h(s) = t) n (<j>h(/(s, 0) = g(s, t)) n 


{f{s, t) f. s) 


(since X and H arc independent) 


< 11 P[X = 5 ] • (1/m 2 ) (since {® r } r€ R is pairwise independent) 

seS teT 

= (1/m) 2 P[X = s] = 1/m. □ 

seS 


We now present several constructions of pairwise independent and universal 
families of hash functions. 


Example 8.36. By setting k := 2 in Example 8.27, for each prime p, we immedi- 
ately get a pairwise independent family of hash functions { O,- } i € r from Z p to Z p , 
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where R = Z p x Z p , and for r = (tq, /'i ) 6 R, the hash function O r is given by 


dv : Z p -> Z p 

s i-» ro + ri5. □ 

While very simple and elegant, the family of hash functions in Example 8.36 is 
not very useful in practice. As we saw in Examples 8.34 and 8.35, what we would 
really like are families of hash functions that hash long inputs to short outputs. The 
next example provides us with a pairwise independent family of hash functions that 
satisfies this requirement. 

Example 8.37. Let p be a prime, and let l he a positive integer. Let S := Zp f and 
R := Zp . For each r = (ro, r i, . . . , rf) e R, we define the hash function 

dv : S -> Zp 

(5i, ...,s t )i->ro + riSi+--- + r t s t . 

We will show that { d> r } r6 r is a pairwise independent family of hash functions 
from S to Zp. To this end, let H be a random variable uniformly distributed over 
R. We want to show that for each s, s' e S with s f s', the random variable 
(®h(5),O h ( 5')) is uniformly distributed over Z p x Z p . So let .v f s' be fixed, and 
define the function 

p : R Zp x Zp 

m (0,(5), O r (5')). 

Because p is a group homomorphism, it will suffice to show that p is surjective (see 
Theorem 8.5). Suppose s = (s\,...,se) and s' = (s ' v . . . , s' t ). Since 5 ^ s', we 
must have 5, f s'- for some j = !,...,(. For this j, consider the function 

J J 

p' : R — »• Zp x Zp 

(ro, n n) (r 0 + r jSj, r 0 + rjs'j). 

Evidently, the image of p includes the image of p' , and by Example 8.36, the func- 
tion p' is surjective. □ 

To use the construction in Example 8.37 in applications where the set of inputs 
consists of bit strings of a given length, one can naturally split such a bit string up 
into short bit strings which, when viewed as integers, lie in the set {0, — 1}, 

and which can in turn be viewed as elements of Z p . This gives us a natural, injective 
map from bit strings to elements of Zp f . The appropriate choice of the prime p 
depends on the application. Of course, the requirement that p is prime limits our 
choice in the size of the output set; however, this is usually not a severe restric- 
tion, as Bertrand’s postulate (Theorem 5.8) tells us that we can always choose p 
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to within a factor of 2 of any desired value of the output set size. Nevertheless, 
the construction in the following example gives us a universal (but not pairwise 
independent) family of hash functions with an output set of any size we wish. 

Example 8.38. Let p be a prime, and let m be an arbitrary positive integer. Let 
us introduce some convenient notation: for a e Z p , let [[«]]„, := [rep(a)]„, e Z m 
(recall that rep(a) denotes the unique integer a e {0, . . . , p— 1 } such that a = \a\ p ). 
Let R := Z p x Z*, and for each r = (ro, r\) e R. define the hash function 

: Z p -> Z m 

5 l ^ || /'() + nslm. 

Our goal is to show that { <ty } ,. 6 r is a universal family of hash functions from Z p to 
Z,„. So let s, s' 6 Z p with s s', let Ho and H i be independent random variables, 
with Hq uniformly distributed over Z p and Hi uniformly distributed over Z*, and let 
H := (Hq, Hi). Also, let C be the event that <J>h(s) = <&hCO- We want to show that 
P[C] < 1/m. Let us define random variables Y := Hq + H \s and Y' := Hq + H[s'. 
Also, let s := s' — s 0. Then we have 

P[C] = P [in™ = lY% n 

= P [m m = HY + H | ,?|| m J (since Y' = Y + H , .?) 

= ^ P [(DYJ m = flY + H 1 5 1| m ) n (Y = a) (law of total probability (8.9)) 
= 2 p[(Mm = Ea + HiSl m )n(y = a) 

= 2 P [m™ = P^ = a] 

(by Theorem 8.13, Y and Hi arc independent). 

It will suffice to show that 

P Mm = Da + HisJ m < 1/m (8.33) 

for each a e Z p , since then 

P[C] < 2 (!/»») p [ y = «] = (1/m) ^ PIT = «l = 1/m. 

So consider a fixed a e Z p . As ,v ^ 0 and Hj is uniformly distributed over Z*, it 
follows that Hi 5 is uniformly distributed over Z*, and hence a + Hi 5 is uniformly 
distributed over the set Z p \ {a}. Let M a := { j} e Z /; : |[a|] m = ||/l|| m } . To prove 
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(8.33), we need to show that | M a \ { or } | < (p — 1 )/m. But it is easy to see that 
\M a \ < \p/m\, and since M a certainly contains a, we have 


\M a \{a}\< A -1<- + 


m— 1 


- 1 = 


One drawback of the family of hash functions in the previous example is that the 
prime p may need to be quite large (at least as large as the size of the set of inputs) 
and so to evaluate a hash function, we have to perform modular multiplication of 
large integers. In contrast, in Example 8.37, the prime p can be much smaller 
(only as large as the size of the set of outputs), and so these hash functions can be 
evaluated much more quickly. 

Another consideration in designing families of hash functions is the size of key 
set. The following example gives a variant of the family in Example 8.37 that uses 
somewhat a smaller key set (relative to the size of the input), but is only a universal 
family, and not a pairwise independent family. 


X (£+1 ) 

Example 8.39. Let p be a prime, and let £ be a positive integer. Let S \—7L p 
and R := Z / f . For each r = (r i, . . , e R, we define the hash function 


®,. : S -> Z p 

(so. st, • • • , se) ^ s 0 + nsi + • • • + r e s e . 


Our goal is to show that { d> r } re r is a universal family of hash functions from 
S to 7L P . So let s, s' e S with s s', and let H be a random variable that is 
uniformly distributed over R. We want to show that P[®h(s) = <&h(s')] < 1/p. Let 

s = (so, si, . . . , S() and s' = (s' 0 , s' s' ( ), and set s,- := s' - s,- for i = 0, 1 

Let us define the function 


p : R -> Z p 

(n rt) risi + • • • + r f S(. 


Clearly, Oh(s) = Oh (s') if and only if p(H) = —So. Moreover, p is a group 
homomorphism. There are two cases to consider. In the first case, .?,■ = 0 for all 
i = 1, . . . ,t, in this case, the image of p is {0}, but so ^ 0 (since s ^ s'), and 
so P[p(H) = -so] = 0. In the second case, s, ^ 0 for some i = l, ...,£; in 
this case, the image of p is Z p , and so p(H) is uniformly distributed over Z /; (see 
Theorem 8.5); thus, P[p(H) = — sqI = 1/p. □ 


One can get significantly smaller key sets, if one is willing to relax the defini- 
tions of universal and pairwise independence. Let { } r€ r be a family of hash 
functions from S to T, where m := |T|. Let H be a random variable that is 
uniformly distributed over R. We say that { O,- j ,- e r is £-almost universal if for 
all s, s' e S with s s', we have P[Oh(s) = < £■ Thus, is 
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universal if and only if it is 1 /m - almost universal. We say that { d> r } r€ r is £-almost 
strongly universal if ®h (s) is uniformly distributed over T for each s e S, and 
P[(Oh(s) = t) n (O h(s') = t ')] < e/m for all s,s' e S with s ^ s' and all t, t' e T. 
Constructions, properties, and applications of these types of hash functions are 
developed in some of the exercises below. 


Exercise 8.47. For each positive integer n , let /„ denote {0 1}. Let m 
be a power of a prime, t be a positive integer, S := l/ l \ and R := I*[ l + l) . For 
each r = (ro, r \, . . . , r/) e R , define the hash function 


O , : 


(^l,...,5 £ ) 


(Vo + ris, + • • • + r e s e ) mod m 2 ) / 


m 


Using the result from Exercise 2.13, show that { <3> r } ,-gj? is a pairwise independent 
family of hash functions from S to I m . Note that on a typical computer, if m is a 
suitable power of 2, then it is very easy to evaluate these hash functions, using just 
multiplications, additions, shifts, and masks (no divisions). 


Exercise 8.48. Let be an £-almost universal family of hash functions 

from S to T. Also, let H, X, X' be random variables, where H is uniformly dis- 
tributed over jR, and both X and X' take values in S. Moreover, assume H and 
(X,X') are independent. Show that P[®hW = Oh(X')] < P[X = X'] + e. 


Exercise 8.49. Let be an £-almost universal a family of hash functions 

from S to T, and let H be a random variable that is uniformly distributed over R. 
Let I be a subset of S of size n > 0. Let C be the event that ®h(0 = ®hU) 
for some i,j e I with i ^ j. We define several random variables: for each 
t e T, N, := | {/ e I : <J>h( /) = f}|; M := max{A/, : t e T}\ for each s e S, 
L s := \{i e I : <X>h(s) = Oh( 0}|- Show that: 

(a) P[C] < £«(«- l)/2; 

(b) E[M] < V £« 2 + n\ 

(c) for each s e S, E[L 5 ] < 1 + en. 


The results of the previous exercise show that for many applications, the e- 
almost universal property is good enough, provided £ is suitably small. The next 
three exercises develop £-almost universal families of hash functions with very 
small sets of keys, even when e is quite small. 

x(^+l) 

Exercise 8.50. Let p be a prime, and let £ be a positive integer. LetA := Z p 
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For each r e TL P , define the hash function 

d>,. : S -» 7L P 

(s 0 ,si,...,s^) ^ 5 0 + S\r + • • • + sir 1 . 

Show that {d> r } re z is an i/p - almost universal family of hash functions from S to 

7hp. 

Exercise 8.5 1. Let {d>,-}reR be an £-almost universal family of hash functions 
from S to T. Let { } r 'eR' be an r ' -almost universal family of hash functions from 
S' to T', where T C S'. Show that 

{dy, o o r }( r /) 6 fixfi' 

is an (e + £')-almost universal family of hash functions from S to T' (here, “o” 
denotes function composition). 

Exercise 8.52. Let m and £ be positive integers, and let 0 < a < 1. Given these 
parameters, show how to construct an £-almost universal family of hash functions 
{ O r } re u from Tim to Z m , such that 

£ < (1 + a)/m and log|J?| = 0(\ogm + logt + log(l/a)). 

Hint: use the previous two exercises, and Example 8.38. 

Exercise 8.53. Let { <3>,. } re i? be an £-almost universal family of hash functions 
from S to T. Show that e > 1 /\T\ — 1 /l^l- 

Exercise 8.54. Let { <3> r } rej R be a family of hash functions from S to T , with 
m := |r|. Show that: 

(a) if is £-almost strongly universal, then it is £-almost universal; 

(b) if {®, } re ^ is pairwise independent, then it is 1/m-almost strongly univer- 
sal; 

(c) if {®,-}reR is r-almost universal, and { } r ' € ^ is an r' -almost strongly 
universal family of hash functions from S' to T’ , where T C S', then 
{df, o d> r }(r,,-')eRxR' is an (£ + £')-almost strongly universal family of hash 
functions from S to T'. 

Exercise 8.55. Show that if an £-almost strongly universal family of hash func- 
tions is used in Example 8.35, then Bob gets fooled with probability at most e. 

Exercise 8.56. Show how to construct an £-almost strongly universal family of 
hash functions satisfying the same bounds as in Exercise 8.52, under the restriction 
that m is a prime power. 
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Exercise 8.57. Let p be a prime, and let £ be a positive integer. Let S := hp f 
and R := Z p x 7L P . Lor each Oo, r \ ) e R, define the hash function 

: S Z p 

(5i,. . . , si) i-> r 0 + s\ri + • • • + s e r\. 

Show that is an l/p - almost strongly universal family of hash functions 

from S to TLp. 


8.8 Statistical distance 

This section discusses a useful measure of “distance” between two random vari- 
ables. Although important in many applications, the results of this section (and the 
next) will play only a very minor role in the remainder of the text. 

Let X and Y be random variables which both take values in a finite set S. We 
define the statistical distance between X and Y as 

A[X;T] := ^Xl P[X = 5]_P[y = 5] l- 

1 seS 

Theorem 8.30. For random variables X, Y, Z, we have 


(i) 

0 < A[X; Y\ < 1, 

(n) 

A[X;X] = 0, 

(Hi) 

A|X; Y\ = A[Y;X], and 

(iv) 

A[X;Z] < A[X; V] + A[T;Z] 

Proof. 

Exercise. □ 


It is also clear - from the definition that A[X; Y\ depends only on the distributions 
of X and Y, and not on any other properties. As such, we may sometimes speak of 
the statistical distance between two distributions, rather than between two random 
variables. 

Example 8.40. Suppose X has the uniform distribution on and Y has 

the uniform distribution on {1 ,...,m — 5}, where 5 e {0 m — 1}. Let us 

compute A[X; / ]. We could apply the definition directly; however, consider the 
following graph of the distributions of X and Y : 
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1 /{m — 8) 

1/m 

0 m — 8 m 

The statistical distance between X and Y is just 1 /2 times the area of regions A 
and C in the diagram. Moreover, because probability distributions sum to 1, we 
must have 

area of B + area of A = 1 = area of B + area of C, 

and hence, the areas of region A and region C arc the same. Therefore, 

A[X; V] = area of A = area of C = 8/m. □ 

The following characterization of statistical distance is quite useful: 

Theorem 8.31. Let X and Y be random variables taking values in a set S. For 
every S' C S, we have 

A[X; Y] > |P[X e S'] - P[T e S'] |, 
and equality holds for some S' C S, and in particular, for the set 
S' := {s e S : P[X = s] < P [Y = s]}, 
as well as its complement. 

Proof. Suppose we split the set S into two disjoint subsets: the set So consisting 
of those s e S such that P[X = s] < P[T = s], and the set Ai consisting of those 
s e S such that P[X = ,v] > P[ Y = ,v|. Consider the following rough graph of 
the distributions of X and Y, where the elements of .S'o are placed to the left of the 
elements of Ai: 
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Now, as in Example 8.40, 

A[X; V] = area of A = area of C. 

Now consider any subset S' of .S', and observe that 

P[X 6 S'] - P [Y e S'] = area of C' - area of A', 

where C' is the subregion of C that lies above S', and A' is the subregion of A that 
lies above S'. It follows that |P[X e S'] - P[V e N']| is maximized when S' = No 
or S' = Si, in which case it is equal to A[X; V]. □ 

We can restate Theorem 8.31 as follows: 

A[X;T] = max{ |P[(/>(X)] - P[</>(T)]| : 0 is a predicate on S}. 

This implies that when A[X; Y] is very small, then for every predicate <p, the events 
(j)(X ) and (p(Y) occur with almost the same probability. Put another way, there is no 
“statistical test” that can effectively distinguish between the distributions of X and 
Y . For many applications, this means that the distribution of X is “for all practical 
purposes” equivalent to that of Y, and hence in analyzing the behavior of X, we can 
instead analyze the behavior of Y, if that is more convenient. 

Theorem 8.32. If S and T are Unite sets, X and Y are random variables taking 
values in S, and f : S -» T is a function, then A[/(X); f(Y)] < A[X; Y], 

Proof. We have 

A[/(X); f(Y)] = |P[/(X) e T’l - P[f(Y) e T']\ for some T' C T 
(by Theorem 8.31) 

= |P[X e r l (T')] - P[Y e r\T')]\ 

< A[X; Y] (again by Theorem 8.31). □ 

Example 8.41. Let X be uniformly distributed over the set { 0 m — 1 } , and let Y 

be uniformly distributed over the set {0 ,«-l},for« > m. Let /(f) := t mod m. 

We want to compute an upper bound on the statistical distance between X and f(Y). 
We can do this as follows. Let n = qm — r, where 0 < r < m, so that q = \n/m] . 
Also, let Z be uniformly distributed over { 0 ,...,qm— 1 } . Then / (Z) is uniformly 

distributed over { 0, I } , since every element of { 0 m — 1 } has the same 

number (namely, q) of pre-images under / which lie in the set {0 qm - 1}. 

Since statistical distance depends only on the distributions of the random variables, 
by the previous theorem, we have 

A[X; f(Y)] = A[/(Z); f(Y)] < A [Z; Y], 
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and as we saw in Example 8.40, 

A[Z; Y] = r / qm <l/q< m/n. 


Therefore, 


A[X; f(Y)] < m/n. □ 

We close this section with two useful theorems. 


Theorem 8.33. Suppose X , Y, and Z are random variables, where X and Z are 
independent, and Y and Z are independent. Then A[X, Z; Y.Z\ = A[X, V], 

Note that A[X, Z; Y, Z] is shorthand for A[(X, Z); (Y, Z)]. 

Proof. Suppose X and Y take values in a finite set S, and Z takes values in a finite 
set T. From the definition of statistical distance, 

2A[X,Z; T,Z] = 2 1 P[(* = s) n (Z = t ) ] - P[(Y = s) n(Z = t)] | 

S,t 

= 2 | P[X = s] P[Z = t\- P[T = s] P[Z = t ] | 

S,t 

(by independence) 

= Y,P\Z = t\\ P[X = s] - P[Y = s ] | 

S,t 

= (2 p [ z =*])(El p[X = 5]_P[y = 5] l) 

t s 

= 1 • 2A[X; Y], □ 

Theorem 8.34. Let X\ X n ,Y\, . . . ,Y n be random variables, where { X t } n . } is 

mutually independent, and is mutually independent. Then we have 

n 

A[X, X n ;Y l ,...,Y n ] < ^ A [X,;/,]. 

i=l 

Proof. Since A[X i X„\Y \ .... ,Y n \ depends only on the individual distributions 

of the random variables (Xj X„) and (Yi, Y n ), without loss of general- 

ity, we may assume that (Xi,...,X„) and (Y\,...,Y n ) arc independent, so that 

X\ X,„ Y], ... ,Y„ form a mutually independent family of random variables. 

We introduce random variables Zq Z„, defined as follows: 


N 

o 

II 

>< 

.,x„). 


Z, := (Ti,. 

.,Y u X i+ i,...,X„) for i = 1 , . 

,,n—\, and 

Z„ := (Ti,. 

■ , Y n ). 
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By definition, A[Xi X„;Y\, . . . ,Y„\ = A|Zq;Z„]. Moreover, by paid (iv) of 

Theorem 8.30, we have A[Zo;Z„] < Xr=i A[Z,_| ; Z,|. Now consider any fixed 
index i = 1, . . . , n. By Theorem 8.33, we have 

A[Z,_ i; Z,] = A[X„ (Y x T,_i,X /+1 ,...,X„); 

Yi, (Y\ y,-i,X /+1 ,...,X„)] 

= A [X, ; Yil 

The theorem now follows immediately. □ 

The technique used in the proof of the previous theorem is sometimes called 
a hybrid argument, as one considers the sequence of “hybrid” random variables 
Zo.Zi, Z„, and shows that the distance between each consecutive pair of vari- 

ables is small. 


Exercise 8.58. Let X and Y be independent random variables, each uniformly 
distributed over 7L P , where p is prime. Calculate A[X, Y:X,XY\. 

Exercise 8.59. Let n be an integer that is the product of two distinct primes of 
the same bit length. Let X be uniformly distributed over Z,„ and let Y be uniformly 
distributed over Z*. Show that A[X; Y] < 3 n -1 / 2 . 

Exercise 8.60. Let X and Y be 0/1-valued random variables. Show that 
A[X;T] = |P[X = 1] - P[T = 1] | . 

Exercise 8.61. Let A be a finite set, and consider any function (p ■ A -> {0, 1}. 
Let 8 be a random variable uniformly distributed over {0, 1}, and for b = 0,1, 
let Xb be a random variable taking values in S, and assume that X/, and B arc 
independent. Show that 

|P[0(X B ) = B] - I| = i|P[0(X o ) = 1] - P|0(X,j = 1]| < iA[Xo;X!]. 

Exercise 8.62. Let X, Y be random variables taking values in a finite set S. Lor 
an event B that occurs with non-zero probability, define the conditional statistical 
distance 

A[X; Y \ B\ := ^\P[X = s \ B\ - P[Y = s \ B\\. 

1 seS 

Let {73, }/ 6 / be a finite, pairwise disjoint family of events whose union is B. Show 
that 

A[X;Y\B\P[B]< £ A[X; Y \ #,] P[S,]. 

P[Bd/0 
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Exercise 8.63. Let { c|> r } be a family of hash functions from S to T, with 
m := \T\. We say {O, .},.<=# is £-variationally universal if $«(*) is uniformly 
distributed over T for each s € S, and A|<t>H<V); Y | ® H {s) = t\ < e for each 
.v, s' e S with s ^ s' and each ( £ T; here, H and Y are independent random 
variables, with H uniformly distributed over R. and Y uniformly distributed over 
T. Show that: 

(a) if {dvlreit is pairwise independent, then it is O-variationally universal; 

(b) if {®,-}reR is e-variationally universal, then it is ( 1 /m + £)-almost strongly 
universal; 

(c) if {dVlrgR is £-almost universal, and { } r 'eR' is an F-variationally uni- 
versal family of hash functions from S' to T' . where T C S', then 
{ o O r /} X R' is an (r + £')-variationally universal family of hash 
functions from S to T' . 

Exercise 8.64. Let {®,-}reR be a family hash functions from S to T such that 
(i) each ®,. maps S injectively into T, and (ii) there exists e e [0, 1] such that 
A[Oh(s); ®H<y)] < £ for all s,s' e S, where H is uniformly distributed over R. 
Show that |F| > (1 — £)|A|. 

Exercise 8.65. Let X and Y be random variables that take the same value 
unless a certain event F occurs (i.e., X(co) = Y(co) for all co e F). Show that 
A[X; Y] < P[F], 

Exercise 8 . 66 . Let X and Y be random variables taking values in the interval 
[0, *]. Show that |E[X] - E[/]| < t ■ A[X; Y], 

Exercise 8.67. Show that Theorem 8.33 is not true if we drop the independence 
assumptions. 

Exercise 8 . 68 . Let A be a set of size m > 1. Let F be a random variable that 
is uniformly distributed over the set of all functions from S into S. Let G be a 
random variable that is uniformly distributed over the set of all permutations of S. 
Let si, . . . , s n be distinct, fixed elements of S. Show that 

A[F( Sl ), .... F(s„); G( 5 i), .... G(s„)] < n{n ~ 1} . 

2 m 

Exercise 8.69. Let m be a large integer. Consider three random experiments. In 
the first, we generate a random integer X \ between 1 and m, and then a random inte- 
ger Y i between 1 and X\. In the second, we generate a random integer X? between 
2 and m, and then generate a random integer Y^ between 1 and X?. In the third, 
we generate a random integer X 3 between 2 and m, and then a random integer Y 3 
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between 2 and X3. Show that A[Xi, T^Xt, Y 2 ] = 0(1 /m) and A[X 2 , ^ 2 ^ 3 , T 3 ] = 
0(log m/m), and conclude that A[X], V'i;X3, T 3 ] = 0(\ogm/m). 


8.9 Measures of randomness and the leftover hash lemma (*) 

In this section, we discuss different ways to measure “how random” the distribution 
of a random variable is, and relations among them. 

Let X be a random variable taking values in a finite set S of size m. We define 
three measures of randomness: 

1. the collision probability of X is Xssv = -d 2 ; 

2. the guessing probability of X is max{ P[X = s]:se*5'J; 

3. the distance of X from uniform on S is \ ^ vg V |P[X = 5 ] - 1 /m\. 

Suppose X has collision probability /l, guessing probability y, and distance <5 
from uniform on S. If X’ is another random variable with the same distribution 
as X, where X and X’ independent, then P — P[X = X'] (see Exercise 8.37). If Y 
is a random variable that is uniformly distributed over S, then S = A[X; / ]. If X 
itself is uniformly distributed over S, then p = y = 1/m, and <5 = 0. The quantity 
log 2 ( I /y) is sometimes called the min entropy of X, and the quantity log 2 ( I //?) is 
sometimes called the Renyi entropy of X. 

We first state some easy inequalities: 

Theorem 8.35. Suppose X is a random variable that takes values in a Unite set S 
of size m. If X has collision probability p, guessing probability y, and distance 5 
from uniform on S, then: 

(i) P > 1/m; 

(H) y 2 < P < Y < 1/m + 5. 

Proof. Paid (i) is immediate from Exercise 8.37. The other inequalities arc left as 
easy exercises. □ 

This theorem implies that the collision and guessing probabilities arc minimal 
for the uniform distribution, which perhaps agrees with one’s intuition. 

While the above theorem implies that p and y arc close to 1 / m when S is small, 
the following theorem provides a converse: 

Theorem 8.36. Suppose X is a random variable that takes values in a finite set S 
of size m. If X has collision probability p, and distance 5 from uniform on S, then 
5 < \sjmp — 1 . 

Proof. We may assume that <5 > 0. since otherwise the theorem is already true, 
simply from the fact that p > 1/m. 
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For s e S, let p s := P|X = s], We have <5 = 5 X JiTs _ 1 / m|, and hence 
1 = X, <ls , where := |p s - \/m\/28. So we have 

— < V < 7 ? (by Exercise 8.36) 

m " 

S 

= ^2 - l / ,n)2 

s 

= ( y pi - l/m^j (again by Exercise 8.36) 


4 S 2 


(P ~ 1/m), 


from which the theorem follows immediately. □ 


We arc now in a position to state and prove a very useful result which, intuitively, 
allows us to convert a “low quality” source of randomness into a “high quality” 
source of randomness, making use of an almost universal family of hash functions 
(see end of §8.7). 


Theorem 8.37 (Leftover hash lemma). Let {<f> r ) r aR be a (1 + a) / m-almost uni- 
versal family of hash functions from S to T, where m := \T\. Let H and X be 
independent random variables, where H is uniformly distributed over R, and X 
takes values in S. If f is the collision probability of X, and S' is the distance of 
(H,Oh(X)) from uniform on R x T, then S' < \\Jmf + a. 

Proof. Let ft be the collision probability of (H,®hW)- Our goal is to bound (f 
from above, and then apply Theorem 8.36 to the random variable (H, ®hW)- To 
this end, let i := \R\, and suppose H' and X' arc random variables, where H' has 
the same distribution as H, X' has the same distribution as X , and H. H' ,X, X' form 
a mutually independent family of random variables. Then we have 

P' = P[(H = H') n (® H (X) = Oh'(X'))] 

= P[(H = H') n (O h (X) = 0 H (X'))] 

= j P[Oh(X) = Oh(X')] (a special case of Exercise 8.15) 

< i(P[X = X'] + (1 + a)/m) (by Exercise 8.48) 

= + 1 + a), 

im 

The theorem now follows immediately from Theorem 8.36. □ 

In the previous theorem, if { O r } r€ n is a universal family of hash functions, then 
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we can take a = 0. However, it is convenient to allow a > 0, as this allows for the 
use of families with a smaller key set (see Exercise 8.52). 

Example 8.42. Suppose S := {0, l} xl00 °, T := { 0, 1 } x64 , and that { <3> ? - } is 
a universal family of hash functions from S to T. Suppose X and H arc inde- 
pendent random variables, where X is uniformly distributed over some subset S' 
of S of size > 2 160 , and H is uniformly distributed over R. Then the collision and 
guessing probabilities of X arc at most 2 -160 , and so the leftover hash lemma (with 
a = 0) says that the distance of (H,®hW) from uniform on R x T is 5’, where 
5’ < \ V / 2 64 2 -160 = 2~ 49 . By Theorem 8.32, it follows that the distance of 
from uniform on T is at most <5' < 2 -49 . □ 

The leftover hash lemma allows one to convert “low quality” sources of ran- 
domness into “high quality” sources of randomness. Suppose that to conduct an 
experiment, we need to sample a random variable Y whose distribution is uniform 
on a set T of size m, or at least, its distance from uniform on T is sufficiently small. 
However, we may not have direct access to a source of “real” randomness whose 
distribution looks anything like that of the desired uniform distribution, but rather, 
only to a “low quality” source of randomness. For example, one could model 
various characteristics of a person’s typing at the keyboard, or perhaps various 
characteristics of the internal state of a computer (both its software and hardware) 
as a random process. We cannot say very much about the probability distribu- 
tions associated with such processes, but perhaps we can conservatively estimate 
the collision or guessing probabilities associated with these distributions. Using 
the leftover hash lemma, we can hash the output of this random process, using 
a suitably generated random hash function. The hash function acts like a “mag- 
nifying glass”: it “focuses” the randomness inherent in the “low quality” source 
distribution onto the set T, obtaining a “high quality,” nearly uniform, distribution 
on T. 

Of course, this approach requires a random hash function, which may be just as 
difficult to generate as a random element of T. The following theorem shows, how- 
ever, that we can at least use the same “magnifying glass” many times over, with 
the statistical distance from uniform of the output distribution increasing linearly 
in the number of applications of the hash function. 

Theorem 8.38. Let { O, } re R be a (1 + a) / m-almost universal family of hash 
functions from S to T, where m := |7j. Let H.X \,...,X n be random vari- 
ables, where H is uniformly distributed over R, each X, takes values in S, and 
H,X i,...,X„ form a mutually independent family of random variables. If ft is 
an upper bound on the collision probability of each X,. and S’ is the distance of 
(H,Oh(X i), (£>H(X n )) from uniform on R x T xn , then 5 ' < ^nsj m/1 + a. 
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Proof. Let Y], ... ,Y„ be random variables, each uniformly distributed over T, and 
assume that H,X i, . . . . X n . Y\, Y n form a mutually independent family of ran- 

dom variables. We shall make a hybrid argument (as in the proof of Theorem 8.34). 
Define random variables Zq, Z\, Z„ as follows: 

(H,® h (X i),...,O h (X„)), 

(H, Y\ Y h Q H (X i+ i), .... O h (X„)) for i = 1, . . . , n - 1, and 

(H, Y] Y n ). 

<5' = A[Z 0 ;Z„] 

n 

< ^ A[Z,_i ; Z,] (by paid (iv) of Theorem 8.30) 

(=i 
n 

<^A[H,Y U ..., Y i -u® H (X i ),X i+ 1, . . . ,X„; 

,=1 Yt-i, Y h X i+u ...,X n | 

(by Theorem 8.32) 

n 

= 2 A[H, O h (X,); H, T,| (by Theorem 8.33) 

(=t 

< + a (by Theorem 8.37). □ 

Another source of “low quality” randomness arises in certain cryptographic 
applications, where we have a “secret value” X, which is a random variable that 
takes values in a set S, and which has small collision or guessing probability. We 
want to derive from X a “secret key” whose distance from uniform on some speci- 
fied “key space” T is small. Typically, T is the set of all bit strings of some given 
length, as in Example 8.25. Theorem 8.38 allows us to do this using a “public” 
hash function — generated at random once and for all, published for all to see, and 
used over and over to derive secret keys as needed. However, to apply this theorem, 
it is crucial that the secret values (and the hash key) are mutually independent. 


A) 

Z, 

Z„ 

We have 


Exercise 8.70. Consider again the situation in Theorem 8.37. Suppose that 

T = { 0 m — 1 } , but that we would rather have a nearly uniform distribution 

on T' = {0 m! — 1 }, for some m! < m. While it may be possible to work with 

a different family of hash functions, we do not have to if m is large enough with 
respect to m', in which case we can just use the value Y' := <X>h(X) mod trl . Show 
that the distance of (H, Y') from uniform on R x T' is at most | \Jmfi + a + m' /m. 
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Exercise 8.71. Let {O r },. s ^ be a (1 + a)/m-almost universal family of hash 
functions from S to T, where m := \T\. Suppose H, X, Y,Z are random variables, 
where H is uniformly distributed over R , X takes values in S, Y is uniformly dis- 
tributed over T , and U is the set of values taken by Z with non-zero probability. 
Assume that the family of random variables H, Y, (X.Z) is mutually independent. 

(a) For u e U, define /?(«) := Xses P|^ = s \ Z = u] 2 . Also, let /?' := 

Z !(6 rr /?(«) p L z = «]• Show that A[H, Z,H,Y, Z] < ly/mp + a. 

(b) Suppose that X is uniformly distributed over a subset S' of S, and that Z = 
f(X) for some function / : S -» U. Show that A[H,<&h(X),Z:H, Y, Z] < 
\y/m\U\/\S’\ + a. 


8.10 Discrete probability distributions 

In addition to working with probability distributions over finite sample spaces, one 
can also work with distributions over infinite sample spaces. If the sample space is 
countable , that is, either finite or countably infinite (see §A3), then the distribution 
is called a discrete probability distribution. We shall not consider any other types 
of probability distributions in this text. The theory developed in §§8. 1-8.5 extends 
fairly easily to the countably infinite setting, and in this section, we discuss how 
this is done. 


8.10.1 Basic definitions 

To say that the sample space Q. is countably infinite simply means that there is a 
bijection / from the set of positive integers onto thus, we can enumerate the 
elements of Q as a>i , co 2, W3 , . . . , where w,- := / (2). 

As in the finite case, a probability distribution on Q is a function P : 12 -» [0, 1], 
where all the probabilities sum to 1, which means that the infinite series P(®/) 
converges to one. Luckily, the convergence properties of an infinite series whose 
terms are all non-negative is invariant under a reordering of terms (see §A6), so it 
does not matter how we enumerate the elements of Q. 

Example 8.43. Suppose we toss a fair coin repeatedly until it comes up heads, and 
let k be the total number of tosses. We can model this experiment as a discrete 
probability distribution P, where the sample space consists of the set of all positive 
integers: for each positive integer k, P(k) := 2~ k . We can check that indeed 
Xr=i 2 -/l = 1, as required. 

One may be tempted to model this experiment by setting up a probability dis- 
tribution on the sample space of all infinite sequences of coin tosses: however, 
this sample space is not countably infinite, and so we cannot construct a discrete 
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probability distribution on this space. While it is possible to extend the notion of a 
probability distribution to such spaces, this would take us too far afield. □ 

Example 8.44. More generally, suppose we repeatedly execute a Bernoulli trial 
until it succeeds, where each execution succeeds with probability p > 0 independ- 
ently of the previous trials, and let k be the total number of trials executed. Then 
we associate the probability P(/<) := q k ~ l p with each positive integer k, where 
q := 1 — p, since we have k — 1 failures before the one success. One can easily 
check that these probabilities sum to 1. Such a distribution is called a geometric 
distribution. □ 

Example 8.45. The series X^=i l/^ 3 converges to some positive number c. There- 
fore, we can define a probability distribution on the set of positive integers, where 
we associate with each k > 1 the probability I / ck 3 . □ 

As in the finite case, an event is an arbitrary subset A of C2. The probability P[M] 
of A is defined as the sum of the probabilities associated with the elements of A. 
This sum is treated as an infinite series when A is infinite. This series is guaranteed 
to converge, and its value does not depend on the particular - enumeration of the 
elements of A. 

Example 8.46. Consider the geometric distribution discussed in Example 8.44, 
where p is the success probability of each Bernoulli trial, and q := I — p. For a 
given integer i > 1 , consider the event A that the number of trials executed is at 
least i. Formally, A is the set of all integers greater than or equal to i. Intuitively, 
P[ A] should be since we perform at least i trials if and only if the first i — 1 
trials fail. Just to be sure, we can compute 

P[A] = ^ P (k) = ^ q k ~ l p = q'~ l p Yj qk = q ‘~ lp ’ = q ‘~ l - ° 

k>i k>i k> 0 ^ 

It is an easy matter to check that all the statements and theorems in §8.1 carry 
over verbatim to the case of countably infinite sample spaces. Moreover, Boole’s 
inequality (8.6) and equality (8.7) are also valid for countably infinite families of 
events: 

Theorem 8.39. Suppose A := (J”i Ai, where {-4., }°^ is an infinite sequence of 
events. Then 

(i) P[A] < P[A t ], and 

(ii) P[_4] = Yjf=\ P[ A, | if is pairwise disjoint. 

Proof. As in the proof of Theorem 8.1, for m e Q and B C 12, define S r „ [ B ] := 1 if 
co e B, and 5,,, \ B ] := 0 it o> ^ B. First, suppose that {Aj)fi ] is pairwise disjoint. 
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Evidently, 5 m [A\ = YjT=\ <5r«[A] for each <» e Q, and so 

00 

P[A] = P(a>)6 a [A\ = 2 p (®) E 8mUi] 

cdeQ, coeQ i= 1 

00 00 

= E E p = E p [A], 

i=l i=l 

where we use the fact that we may reverse the order of summation in an infinite 
double summation of non-negative terms (see §A7). That proves (ii), and (i) fol- 
lows from (ii), applied to the sequence {-4.-}°^, where A' t := Ai \ Uj=i >4;, as 
P[A] = 2”i P [A' t ] < 2*! PUtl □ 


8.10.2 Conditional probability and independence 

All of the definitions and results in §8.2 carry over verbatim to the countably 
infinite case. The law of total probability (equations (8.9) and (8.10)), as well 
as Bayes’ theorem (8.11), extend to families of events { 73/ } indexed by any 
countably infinite set I. The definitions of independent families of events (k - wise 
and mutually) extend verbatim to infinite families. 


8.10.3 Random variables 

All of the definitions and results in §8.3 carry over verbatim to the countably infi- 
nite case. Note that the image of a random variable may be either finite or countably 
infinite. The definitions of independent families of random variables (/<-wise and 
mutually) extend verbatim to infinite families. 


8.10.4 Expectation and variance 

We define the expected value of a real-valued random variable X exactly as in 
(8.18); that is, E[X] := J B X(ffl) p (m), but where this sum is now an infinite 
series. If this series converges absolutely (see §A6), then we say that X has finite 
expectation, or that E[X] is finite. In this case, the series defining E[X] converges 
to the same finite limit, regardless of the ordering of the terms. 

If E[X] is not finite, then under the right conditions, E[X] may still exist, although 
its value will be ±oo. Consider first the case where X takes only non-negative 
values. In this case, if E[X] is not finite, then we naturally define E[X] := oo, as the 
series defining E[X] diverges to oo, regardless of the ordering of the terms. In the 
general case, we may define random variables X + and X - , where 

X + (®) := max{0,X(o>)} and X~(o>) := max{0, -X(o>)}, 
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so that X = X + — X~, and both X + and X~ take only non-negative values. Clearly, 
X has finite expectation if and only if both X + and X~ have finite expectation. 
Now suppose that E[X] is not finite, so that one of E[X + ] or E[X _ ] is infinite. If 
E[X + ] = E[X _ ] = oo, then we say that E[X] does not exist; otherwise, we define 
E[X] := E[X + ]-E[X _ ], which is ±oo; in this case, the series defining E[X] diverges 
to ±oo, regardless of the ordering of the terms. 

Example 8.47. Let X be a random variable whose distribution is as in Exam- 
ple 8.45. Since the series X/T=i converges and the series YJt=\ 1/^ diverges, 
the expectation E[X] is finite, while E[X 2 ] = oo. One may also verify that the 
random variable (— I ) x X 2 has no expectation. □ 

All of the results in §8.4 carry over essentially unchanged, although one must 
pay some attention to “convergence issues.” 

If E[X] exists, then we can regroup the terms in the series X(o>) P(®), with- 
out affecting its value. In particular, equation (8.19) holds provided E[X] exists, 
and equation (8.20) holds provided E[/(X)] exists. 

Theorem 8.14 still holds, under the additional hypothesis that E[X] and E[ /] arc 
finite. Equation (8.21) also holds, provided the individual expectations E [ X,| arc 
finite. More generally, if E[X] and E[ T ] exist, then E[X + T] = E[X] + E[T], unless 
E[X] = oo and E[T] = -oo, or E[X] = -oo and E[T] = oo. Also, if E[X] exists, 
then E[nX] = a E[X], unless a = 0 and E[X] = ±oo. 

One might consider generalizing (8.21) to countably infinite families of ran- 
dom variables. To this end, suppose {X,-}?^ is an infinite sequence of real-valued 
random variables. The random variable X := X,- is well defined, provided 

the series Xj(o>) converges for each co e Q. One might hope that E[X] = 
E[X,]; however, this is not in general true, even if the individual expectations, 
E[X,], arc non-negative, and even if the series defining X converges absolutely for 
each w\ nevertheless, it is true when the X, ’s arc non-negative: 

Theorem 8.40. Let {X,}^ be an infinite sequence of random variables. Suppose 
that for each i > 1 , X, takes non-negative values only, and has finite expectation. 
Also suppose that j X, (co) converges for each co e £ 2, and define X := ^i- 

Then we have 

00 

E[X| = £ E[X, |. 

(=i 

Proof. This is a calculation just like the one made in the proof of Theorem 8.39, 
where, again, we use the fact that we may reverse the order of summation in an 
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infinite double summation of non- negative terms: 

00 

E[X] = Yj P(®)*(®) = Z p (®) Z X ' (cu) 

coeQ coe £2 i = 1 

00 00 

= Z Z p (®)*<(®) = Z E[X ' ] - n 

i= 1 i=l 

Theorem 8. 15 holds under the additional hypothesis that E[X] and E[ / ] arc finite. 
Equation (8.22) also holds, provided the individual expectations E[X,] arc finite. 
Theorem 8.16 still holds, of course. Theorem 8.17 also holds, but where now the 
sum may be infinite; it can be proved using essentially the same argument as in the 
finite case, combined with Theorem 8.40. 


Example 8.48. Suppose X is a random variable with a geometric distribution, as 
in Example 8.44, with an associated success probability p and failure probabil- 
ity q := I — p. As we saw in Example 8.46, for every integer i > 1, we have 
P[X > /] = q '~ 1 . We may therefore apply the infinite version of Theorem 8.17 to 
easily compute the expected value of X: 


E[X] = £ P[X > 'I = Z 

/=! /=i 


1 

~q 


1 

-. □ 
P 


Example 8.49. To illustrate that Theorem 8.40 does not hold in general, consider 
the geometric distribution on the positive integers, where P (J) = 2~ J for j > 1. 
For i > 1, define the random variable X, so that X, (/) = 2‘. Xj(i + 1) = — 2 I+1 , 
and X ; O') = 0 for all j £ { i, i + I } . Then E[X,] = 0 for all i > 1, and so 
2 (>1 E[X,] = 0. Now define X := ^ ;>1 X,. This is well defined, and in fact 
X(f) = 2, while X(j) = 0 for all j > 1. Hence E[X] = 1. □ 


The variance Var[X] of X exists only when p '■= E[X] is finite, in which case 
it is defined as usual as E[(X - p) 2 ], which may be either finite or infinite. Theo- 
rems 8.18, 8.19, and 8.20 hold provided all the relevant expectations and variances 
are finite. 

The definition of conditional expectation carries over verbatim. Equation (8.23) 
holds, provided E[X | B ] exists, and the law of total expectation (8.24) holds, pro- 
vided E[X] exists. The law of total expectation also holds for a countably infinite 
partition { B , } ((E / . provided E[X] exists, and each of the conditional expectations 
E[X | Bj\ is finite. 



8.11 Notes 


215 


8.10.5 Some useful bounds 

All of the results in this section hold, provided the relevant expectations and vari- 
ances arc finite. 

Exercise 8.72 . Let {-4.,-}°^ be a family of events, such that A, C Ai+\ for each 
i > 1, and let A := IJ/^i A- Show that P[^4] = lim^oo P[_4. r ] . 

Exercise 8,73 , Generalize Exercises 8.6, 8.7, 8.23, and 8.24 to the discrete set- 
ting, allowing a countably infinite index set I. 

Exercise 8.74 . Suppose X is a random variable taking positive integer values, 
and that for some real number q, with 0 < q < 1 , and for all integers i > 1 , we 
have P[X > /] = q'~ l . Show that X has a geometric distribution with associated 
success probability p := 1 — q. 

Exercise 8.75. This exercise extends Jensen’s inequality (see Exercise 8.25) to 
the discrete setting. Suppose that / is a convex function on an interval I. Let X 
be a random variable whose image is a countably infinite subset of I, and assume 
that both E[X] and E[/(X)] are finite. Show that E[/(X)] > /(E[X]). Hint: use 
continuity. 

Exercise 8.76. A gambler plays a simple game in a casino: with each play of 
the game, the gambler may bet any number m of dollars; a fair coin is tossed, and 
if it comes up heads , the casino pays m dollars to the gambler, and otherwise, the 
gambler pays m dollars to the casino. The gambler plays the game repeatedly, 
using the following strategy: he initially bets a dollar, and with each subsequent 
play, he doubles his bet; if he ever wins, he quits and goes home; if he runs out of 
money, he also goes home; otherwise, he plays again. Show that if the gambler has 
an infinite amount of money, then his expected winnings are one dollar, and if he 
has a finite amount of money, his expected winnings are zero. 


8.11 Notes 

The idea of sharing a secret via polynomial evaluation and interpolation (see Exam- 
ple 8.28) is due to Shamir [90], 

Our Chernoff bound (Theorem 8.24) is one of a number of different types of 
bounds that appear in the literature under the rubric of “Chernoff bound.” 

Universal and pairwise independent hash functions, with applications to hash 
tables and message authentication codes, were introduced by Carter and Weg- 
man [25, 105]. The notions of £-almost universal and £-almost strongly universal 
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hashing were developed by Stinson [101]. The notion of e-variationally universal 
hashing (see Exercise 8.63) is from Krovetz and Rogaway [57]. 

The leftover hash lemma (Theorem 8.37) was originally stated and proved by 
Impagliazzo, Levin, and Luby [48], who use it to obtain an important result in the 
theory of cryptography. Our proof of the leftover hash lemma is loosely based on 
one by Impagliazzo and Zuckermann [49], who also present further applications. 
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It is sometimes useful to endow our algorithms with the ability to generate random 
numbers. In fact, we have already seen two examples of how such probabilistic 
algorithms may be useful: 

• at the end of §3.4, we saw how a probabilistic algorithm might be used to 
build a simple and efficient primality test; however, this test might incor- 
rectly assert that a composite number is prime; in the next chapter, we 
will see how a small modification to this algorithm will ensure that the 
probability of making such a mistake is extremely small; 

• in §4.5, we saw how a probabilistic algorithm could be used to make Fer- 
mat's two squares theorem constructive; in this case, the use of randomiza- 
tion never leads to incorrect results, but the running time of the algorithm 
was only bounded “in expectation.” 

We will see a number of other probabilistic algorithms in this text, and it is high 
time that we place them on a firm theoretical foundation. To simplify matters, 
we only consider algorithms that generate random bits. Where such random bits 
actually come from will not be of great concern to us here. In a practical imple- 
mentation, one would use a pseudo-random bit generator, which should produce 
bits that “for all practical purposes” arc “as good as random.” While there is a 
well-developed theory of pseudo-random bit generation (some of which builds on 
the ideas in §8.9), we will not delve into this here. Moreover, the pseudo-random 
bit generators used in practice are not based on this general theory, and arc much 
more ad hoc in design. So, although we will present a rigorous formal theory of 
probabilistic algorithms, the application of this theory to practice is ultimately a bit 
heuristic; nevertheless, experience with these algorithms has shown that the theory 
is a very good predictor of the real-world behavior of these algorithms. 
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9.1 Basic definitions 

Formally speaking, we will add a new type of instruction to our random access 
machine (described in §3.2): 

random bit This type of instruction is of the form y <- RAND, where y takes the 
same form as in arithmetic instructions. Execution of this type of instruc- 
tion assigns to y a value sampled from the uniform distribution on {0,1}, 
independently from the execution of all other random-bit instructions. 
Algorithms that use random-bit instructions are called probabilistic (or ran- 
domized), while those that do not are called deterministic. 

In describing probabilistic algorithms at a high level, we shall write “y {0, 1 }” 
to denote the assignment of a random bit to the variable y, and “y {0, 1 } to 
denote the assignment of a random bit string of length £ to the variable y. 

To analyze the behavior of a probabilistic algorithm, we first need a probability 
distribution that appropriately models its execution. Once we have done this, we 
shall define the running time and output to be random variables associated with 
this distribution. 


9.1.1 Defining the distribution 

It would be desirable to define a probability distribution that could be used for all 
algorithms and all inputs. While this can be done in principle, it would require 
notions from the theory of probability more advanced than those we developed in 
the previous chapter. Instead, for a given probabilistic algorithm A and input x, we 
shall define a discrete probability distribution that models A’s execution on input 
x. Thus, every algorithm/input pair yields a different distribution. 

To motivate our definition, consider Example 8.43. We could view the sample 
space in that example to be the set of all bit strings consisting of zero or more 
0 bits, followed by a single 1 bit, and to each such bit string co of this special 
form, we assign the probability where \o>\ denotes the length of co. The 

“random experiment” we have in mind is to generate random bits one at a time until 
one of these special “halting” strings is generated. In developing the definition of 
the probability distribution for a probabilistic algorithm, we simply consider more 
general sets of “halting” strings, as determined by the algorithm and its input. 

So consider a fixed algorithm A and input x. Let A be a finite bit string of length, 
say, £. We can use A to “drive” the execution of A on input x for up to £ execution 
steps, as follows: for each step i = 1, ... ,£, if the z'th instruction executed by A 
is y <— RAND, the z'th bit of A is assigned to y. In this context, we shall refer to 
A as an execution path. The reader may wish to visualize A as a finite path in an 
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infinite binary tree, where we start at the root, branching to the left if the next bit 
in A is a 0 bit, and branching to the right if the next bit in A is a 1 bit. 

After using A to drive A on input x for up to l steps, we might find that the 
algorithm executed a halt instruction at some point during the execution, in which 
case we call A a complete execution path; moreover, if this halt instruction was the 
£th instruction executed by A , then we call A an exact execution path. 

Our intent is to define the probability distribution associated with A on input x to 
be P : Q — »• [0, 1], where the sample space Q is the set of all exact execution paths, 
and P(to) := for each to e LI. However, for this to work, all the probabilities 
must sum to 1. The next theorem at least guarantees that these probabilities sum to 
at most 1. The only property of Q. that really matters in the proof of this theorem 
is that it is prefix free, which means that no exact execution path is a proper prefix 
of any other. 


Theorem 9.1. Let Q be the set of all exact execution paths for A on input x. Then 

Z 2 - I®l < 1 

(U6i3 z - 1 ■ 

Proof. Let k be a non-negative integer. Let Q/ ( C Q be the set of all exact execution 
paths of length at most k, and let a* : = YjweQ k 2 - ^. We shall show below that 

a* < 1. (9.1) 


From this, it will follow that 


y 2 ~ h 


we£2 


lim otk < 1. 

k — >co 


To prove the inequality (9.1), consider the set Q of all complete execution paths 
of length equal to k. We claim that 


a k =2~ k \C k \, 


(9.2) 


from which (9.1) follows, since clearly, \Ck\ < 2 k . So now we arc left to prove 
(9.2). Observe that by definition, each A e C/, extends some to e LL; that is, co is 
a prefix of A; moreover, to is uniquely determined by A, since no exact execution 
path is a proper prefix of any other exact execution path. Also observe that for 
each to e , if Ck{co) is the set of execution paths A e Q that extend co, then 
\Ck{co)\ = and by the previous observation, [Ck{co)} me Q k is a partition of 

Ck . Thus, we have 

a fc =2 2_N= Z 2_N E 2 ~ k+M = 2 ~ k E E 1 = 2 “*l c *l’ 

coE&k coeQ/c AeC/c(cq) coEQk AeC/c(cq) 

which proves (9.2). □ 
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From the above theorem, if Q. is the set of all exact execution paths for A on 
input x, then 

a := 2 2 -N < 1, 
cue £2 

and we say that A halts with probability a on input x. If a = 1, we define the 
distribution P : Q -> [0, 1] associated with A on input x, where P(©) := for 
each co e 1 2. 

We shall mainly be interested in algorithms that halt with probability 1 on all 
inputs. The following four examples provide some simple criteria that guarantee 
this. 

Example 9.1. Suppose that on input x, A always halts within a finite number of 
steps, regardless of its random choices. More precisely, this means that there is 
a bound l (depending on A and x), such that all execution paths of length l are 
complete. In this case, we say that A’s running time on input x is strictly bounded 
by l, and it is clear that A halts with probability 1 on input x. Moreover, one can 
much more simply model A’s computation on input x by working with the uniform 
distribution on execution paths of length l. □ 

Example 9.2. Suppose A and B are probabilistic algorithms that both halt with 
probability 1 on all inputs. Using A and B as subroutines, we can form their serial 
composition; that is, we can construct the algorithm 

C(x) : output B(A(x)), 

which on input x, first runs A on input x, obtaining a value y, then runs B on input 
y, obtaining a value z, and finally, outputs z. We claim that C halts with probability 
1 on all inputs. 

For simplicity, we may assume that A places its output y in a location in memory 
where B expects to find its input, and that B places its output in a location in 
memory where C’s output should go. With these assumptions, the program for C is 
obtained by simply concatenating the programs for A and B , making the following 
adjustments: every halt instruction in A’s program is translated into an instruction 
that branches to the first instruction of B' s program, and every target in a branch 
instruction in B' s program is increased by the length of A’s program. 

Let Q be the sample space representing A’s execution on an input x. Each 
0 ) e Q determines an output y, and a corresponding sample space D! w representing 
it's execution on input y. The sample space representing C’s execution on input x 
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where mm' is the concatenation of m and a/. We have 

£ 2~\«><o'\ _ ^ 2 " |fu| ^ 2 -1 *' 1 = 2 2 ~ M 1 = 1, 

mm'&Q" weQ co'eQ'a, ®si2 

which shows that C halts with probability 1 on input x. □ 

Example 9.3. Suppose A, B , and C are probabilistic algorithms that halt with 
probability 1 on all inputs, and that A always outputs either true or false. Then we 
can form the conditional construct 

D(x) : if A(x) then output B(x ) else output C(x). 

By a calculation similar to that in the previous example, it is easy to see that D 
halts with probability 1 on all inputs. □ 

Example 9.4. Suppose A and B are probabilistic algorithms that halt with proba- 
bility 1 on all inputs, and that A always outputs either true or false. We can form 
the iterative construct 

C(x) : while A(x) dox <- B{x) 

output x. 

Algorithm C may or may not halt with probability 1 . To analyze C, we define 
an infinite sequence of algorithms { C„ namely, we define Co as 

Co(x) : halt, 

and for n > 0, we define C„ as 

C„(x ) : if A(x) then C n _i(2?(x)). 

Essentially, C„ drives C for up to n loop iterations before halting, if necessary, in 
Q). By the previous three examples, it follows by induction on n that each C„ halts 
with probability 1 on all inputs. Therefore, we have a well-defined probability 
distribution for each C n and each input x. 

Consider a fixed input x. For each n > 0, let fi n be the probability that on input 
x, C„ terminates by executing algorithm Co- Intuitively, fi n is the probability that C 
executes at least n loop iterations; however, this probability is defined with respect 
to the probability distribution associated with algorithm C n on input x. It is not hard 
to see that the sequence {fi n }° n 1 0 is non-increasing, and so the limit fi := lim,,-.^. fi n 
exists; moreover, C halts with probability 1 - f on input x. 

On the one hand, if the loop in algorithm C is guaranteed to terminate after a 
finite number of iterations (as in a “for loop”), then C certainly halts with proba- 
bility 1. Indeed, if on input x, there is a bound l (depending on x) such that the 
number of loop iterations is always at most l, then ff+\ = fit +2 = • • • = 0. On the 
other hand, if on input x, C enters into a good, old-fashioned infinite loop, then C 
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certainly does not halt with probability 1, as Po = Pi = • • • = 1. Of course, there 
may be in-between cases, which require further analysis. □ 

We now illustrate the above criteria with a couple of some simple, concrete 
examples. 

Example 9.5. Consider the following algorithm, which models an experiment in 
which we toss a fair coin repeatedly until it comes up heads : 

repeat 

bt {0,1} 
until b = 1 

For each positive integer n, let /?„ be the probability that the algorithm executes 
at least n loop iterations, in the sense of Example 9.4. It is not hard to see that 
P„ = 2~ n+i , and since P„ -» 0 as n — > oo, the algorithm halts with probability 
1, even though the loop is not guaranteed to terminate after any particular, finite 
number of steps. □ 

Example 9.6. Consider the following algorithm: 

i <- 0 
repeat 

i <— i + I 

at {0, 1} X/ 

until a = 0 xi 

For each positive integer n, let /?„ be the probability that the algorithm executes 
at least n loop iterations, in the sense of Example 9.4. It is not hard to see that 

Pn = - 2 “') > Y[e~ 2 ,+l = e-Z"J 2 ~‘ > e - 2 , 

/= 1 i = 1 

where we have made use of the estimate (iii) in §A1. Therefore, 

lim p n > e ~ 2 > 0, 

n — > OO 

and so the algorithm does not halt with probability 1, even though it never falls into 
an infinite loop. □ 

9.1.2 Defining the running time and output 

Let A be a probabilistic algorithm that halts with probability 1 on a fixed input x. 
We may define the random variable Z that represents A’s running time on input x, 
and the random variable Y that represents A’s output on input x. 
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Formally, Z and Y arc defined using the probability distribution on the sample 
space Q. defined in §9.1.2. The sample space Q consists of all exact execution 
paths for A on input x. For each © e 12, Z(cw) := |cw|, and Y{a>) is the output 
produced by A on input x, using to to drive its execution. 

The expected running time of A on input x is defined to be E[Z]. Note that in 
defining the expected running time, we view the input as fixed, rather than drawn 
from some probability distribution. Also note that the expected running time may 
be infinite. 

We say that A runs in expected polynomial time if there exist constants a, b, 
and c, such that for all n, and for all inputs x of size n, the expected running time 
of A on input x is at most an b + c. We say that A runs in strict polynomial time 
if there exist constants a, b , and c, such that for all n, and for all inputs x of size n, 
A’s running time on input x is strictly bounded by an b + c (as in Example 9. 1). 

Example 9.7. Consider again the algorithm in Example 9.5. Let L be the random 
variable that represents the number of loop iterations executed by the algorithm. 
The distribution of L is a geometric distribution, with associated success probability 
1/2 (see Example 8.44). Therefore, E[L] = 2 (see Example 8.46). Let Z be the 
random variable that represents the running time of the algorithm. We have Z < cL, 
for some implementation-dependent constant c. Therefore, E[Z] < c E[L] = 2c. □ 

Example 9.8. Consider the following probabilistic algorithm that takes as input a 
positive integer m. It models an experiment in which we toss a fair coin repeatedly 
until it comes up heads m times. 

k <- 0 
repeat 

bt {0,1} 

if b = 1 then k «- k + 1 
until k = m 

Let L be the random variable that represents the number of loop iterations executed 
the algorithm on a fixed input m. We claim that E[L] = 2m. To see this, define 
random variables L \, . . . , L m , where L\ is the number of loop iterations needed to 
get b = 1 for the first time, is the number of additional loop iterations needed 
to get b = 1 for the second time, and so on. Clearly, we have L = L\ + ■ ■ ■ + L m , 
and moreover, E|/_, | = 2 for / = I , . . . , m: therefore, by lineality of expectation, we 
have E[L] = E[Lj] + • • • + E[L m ] = 2m. It follows that the expected running time 
of this algorithm on input m is 0{m). □ 
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Example 9.9. Consider the following algorithm: 
n <— 0 

repeat n <- n + 1, b A- {0, 1 } until b = 1 
repeat a 4 - {0, 1} XH until a = 0 X " 

The expected running time is infinite (even though it does halt with probability 1). 
To see this, define random variables L i and Lj, where L i is the number of iterations 
of the first loop, and L 2 is the number of iterations of the second. As in Exam- 
ple 9.7, the distribution of Li is a geometric distribution with associated success 
probability 1/2, and E[Z_ i] = 2. For each k > 1, the conditional distribution of /_2 
given L\ = /< is a geometric distribution with associated success probability l/2 k , 
and so E[/_2 | L\ = k] = 2 k . Therefore, 

E[L 2 ] = 2 E[h 2 I /-I = k\ P[Li = k] = 2 2 k • 2“* = 2 1 = 00 ■ D 

k>\ k > 1 k>\ 

We have presented a fairly rigorous definitional framework for probabilistic 
algorithms, but from now on, we shall generally reason about such algorithms at a 
higher, and more intuitive, level. Nevertheless, all of our arguments can be trans- 
lated into this rigorous framework, the details of which we leave to the interested 
reader. Moreover, all of the algorithms we shall present halt with probability 1 on 
all inputs, but we shall not go into the details of proving this (but the criteria in 
Examples 9. 1-9.4 can be used to easily verify this). 


Exercise 9.1. Suppose A is a probabilistic algorithm that halts with probability 
1 on input x, and let P : Q — ► [0, 1] be the corresponding probability distribution. 
Let X be an execution path of length t, and assume that no proper prefix of X is 
exact. Let £x := {co e Q : w extends X}. Show that P[£\\ = 2~ e . 

Exercise 9.2. Let A be a probabilistic algorithm that on a given input x, halts 
with probability 1 , and produces an output in the set T. Let P be the correspond- 
ing probability distribution, and let Y and Z be random variables representing the 
output and running time, respectively. For each k > 0, let P/ ( be the uniform 
distribution on all execution paths X of length k. We define random variables 
and Zfc, associated with P/ ( , as follows: if X is complete, we define Y/(2) to be 
the output produced by A, and Zk(X) to be the actual number of steps executed by 
A; otherwise, we define YT(A) to be the special value “J.” and Z/, ( X) to be k. For 
each t e T, let p,^ be the probability (relative to P/ { ) that Y \ = t, and let pt be the 
expected value (relative to P*) of Z/,. Show that: 

(a) for each t e T, P[Y = i] = lim p t k\ 
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(b) E[Z] = lim p k . 

k — >oo 

Exercise 9.3. Let A\ and /l 2 be probabilistic algorithms. Let B be any proba- 
bilistic algorithm that always outputs 0 or 1 . Lor i = 1,2, let A' be the algorithm 
that on input x computes and outputs B(A,(x)). Lix an input x, and let Y 1 and Y 2 
be random variables representing the outputs of A\ and /t 2 , respectively, on input 
x, and let Yj and Y' 2 be random variables representing the outputs of A' { and A' 0 , 
respectively, on input x. Assume that the images of Y 1 and Y? are finite, and let 
5 := A[Yi; Y 2 ] be their statistical distance. Show that | P | Y'j = 1] - P[Y, = 1] | < 5. 


9.2 Generating a random number from a given interval 

Suppose we want to generate a number, uniformly at random from the interval 

{0 m — 1 } , for a given positive integer m. 

If m is a power of 2, say m = 2 , then we can do this directly as follows: generate 
a random f-bit string < 7 , and convert a to the integer I {a) whose base-2 represen- 
tation is cr; that is, if o = bt-ibe -2 ■ ■ • b 0 , where the bf s are bits, then 

e - 1 

Kg) := 2 b,2‘. 

<=o 

In the general case, we do not have a direct way to do this, since we can only 
directly generate random bits. But the following algorithm does the job: 

Algorithm RN. On input m, where m is a positive integer, do the following, where 
£ := | log 2 m \ : 

repeat 

a t {0,l} xf 
y Kg) 
until y < m 
output y 

Theorem 9.2. The expected running time of Algorithm RN is 0(len(n?)), and its 
output is uniformly distributed over {0 m — 1 } . 

Proof. Note that m < 2 1 < 2m. Let L denote the number of loop iterations of this 
algorithm, and Z its running time. With every loop iteration, the algorithm halts 
with probability m/2 1 , and so the distribution of L is a geometric distribution with 
associated success probability m/2 1 > 1/2. Therefore, E[L] = 2 1 /m < 2. Since 
Z < clen(n;) • L for some constant c, it follows that E[Z] = 0(len(m)). 

Next, we analyze the output distribution. Let Y denote the output of the algo- 
rithm. We want to show that Y is uniformly distributed over {0 m — 1 } . This 
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is perhaps intuitively obvious, but let us give a rigorous justification of this claim. 
To do this, for i = 1,2,..., let Y, denote the value of y in the z'th loop iteration; 
for completeness, if the z'th loop iteration is not executed, then we define Y, := X. 
Also, for z = 1,2..., let Hi be the event that the algorithm halts in the z'th loop 

iteration (i.e., 77, is the event that L = z). Let re {0 m — 1 } be fixed. 

First, by total probability (specifically, the infinite version of (8.9), discussed in 
§8.10.2), we have 

P[Y = n = 2 = 0 n Hi] = 2 Pm = t)n H,\. (9.3) 

i>1 i> 1 

Next, observe that as each loop iteration works the same as any other, it follows 
that for each z > 1 , we have 

P\(Yi = t) nHj\L> i ] = P [(/, = 0 n m\ = P[Yi = t\ = 2~ ( . 
Moreover, since H, implies L > i, we have 

P[(Yi = t) n Hi] = P[(T, = t) n Hi n (L > /)] 

= P[(Yi = t) n Hi I L > i] P[L > z] = 2~ f P[L > i], 

and so using (9.3) and the infinite version of Theorem 8.17 (discussed in §8.10.4), 
we have 

P[Y = t] = 2 Pm, = on Hi] = 2 p [L > i] = 2~ f 2 P\L > /•] 

i>l Z>1 Z>1 

= 2~ l ■ E[L] = 2~ l -2 l /m = l/m. 

This shows that Y is uniformly distributed over { 0 m — 1 } . □ 


Of course, by adding an appropriate value to the output of Algorithm RN, we can 
generate random numbers uniformly in the interval {zzp , . . . , mi } , for any given m\ 
and mi. In what follows, we shall denote the execution of this algorithm as 

y { Z77 1 , . . . , Z« 2 } - 

More generally, if T is any finite, non-empty set for which we have an efficient 
algorithm whose output is uniformly distributed over T, we shall denote the exe- 
cution of this algorithm as 

ytT. 


For example, we may write 


y 


7L m 


to denote assignment to y of a randomly chosen element of 7L m . Of course, this 
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is done by running Algorithm RN on input m, and viewing its output as a residue 
class modulo m. 

We also mention the following alternative algorithm for generating an almost- 
random number from an interval. 

Algorithm RN'. On input m, k, where both m and k arc positive integers, do the 
following, where t := [ log 2 m | : 

a t {0, l} x d +fc) 

y *r- I (<r) mod m 
output y 

Compared with Algorithm RN, Algorithm RN' has the advantage that there arc 
no loops — it always halts in a bounded number of steps; however, it has the disad- 
vantage that its output is not uniformly distributed over the interval { 0 m — 1 } . 

Nevertheless, the statistical distance between its output distribution and the uniform 

distribution on {0 , m — 1} is at most 2~ k (see Example 8.41 in §8.8). Thus, 

by choosing k suitably large, we can make the output distribution “as good as 
uniform” for most practical purposes. 

Exercise 9.4. Prove that if m is not a power of 2, there is no probabilistic 
algorithm whose running time is strictly bounded and whose output distribution 
is uniform on {0 m— 1 } . 

Exercise 9.5. You arc to design and analyze an efficient probabilistic algorithm 
B that takes as input two integers n and y, with n > 0 and 0 < y < n, and always 
outputs 0 or 1. Your algorithm should satisfy the following property. Suppose A is a 
probabilistic algorithm that takes two inputs, n and x, and always outputs an integer 
between 0 and n. Let Y be a random variable representing A’s output on input n, x. 
Then for all inputs n, x, we should have P [B(n, A(n, x)) outputs 1] = E[T]/«. 


9.3 The generate and test paradigm 

Algorithm RN, which was discussed in §9.2, is a specific instance of a very general 
type of construction that may be called the “generate and test” paradigm. 

Suppose we have two probabilistic algorithms, A and B , and we combine them 
to form a new algorithm 

C(x) : repeat y *- A(x) until B(x, y) 

output y. 

Here, we assume that B(x, y) always outputs either true or false. 

Our goal is to answer the following questions about C for a fixed input x: 
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1 . Does C halt with probability 1 ? 

2. What is the expected running time of C? 

3. What is the output distribution of C? 

The answer to the first question is “yes,” provided (i) A halts with probability 
1 on input x, (ii) for all possible outputs y of A(x), B halts with probability 1 on 
input (x, y), and (iii) for some possible output y of A(x), B(x, y) outputs true with 
non- zero probability. We shall assume this from now on. 

To address the second and third questions, let us define random variables L, Z, 
and Y, where L is the total number of loop iterations of C, Z is the total running 
time of C, and Y is the output of C. We can reduce the study of L, Z, and Y to 
the study of a single iteration of the main loop. Instead of working with a new 
probability distribution that directly models a single iteration of the loop, it is more 
convenient to simply study the first iteration of the loop in C. To this end, we define 
random variables Z\ and Y\, where Zj is the running time of the first loop iteration 
of C, and Y\ is the value assigned to y in the first loop iteration of C. Also, let H\ 
be the event that the algorithm halts in the first loop iteration, and let T be the set of 
possible outputs of A(x). Note that by the assumption in the previous paragraph, 
P [Hi] > 0. 

Theorem 9.3. Under the assumptions above, 

(i) L has a geometric distribution with associated success probability P[Pl\ | , 
and in particular, E[L] = 1/ P[H\\; 

(ii) E[Z] = E[Z ! ] E[L] = E [Z ! ] / 

(iii) for every t e T, P[T = t] = P|T, = t | Hi]. 

Proof, (i) is clear. 

To prove (ii), for i > 1, let Z, be the time spent by the algorithm in the z'th loop 
iteration, so that Z = £ (>l Z, . Now, the conditional distribution of Z, given L > i 
is (essentially) the same as the distribution of Zi; moreover, Z, = 0 when L < i. 
Therefore, by the law of total expectation (8.24), for each i > 1, we have 

E[Z/] = E[Z, \L>i\ P[L > /] + E|Z, | L < i] P[L < i] = E[Z,] P[L > i]. 

We may assume that E[Zi] is finite, as otherwise (ii) is trivially true. By Theo- 
rem 8.40 and the infinite version of Theorem 8.17 (discussed in §8.10.4), we have 

E[Z] = 2 E|Z,| = 2 E t Z i] ^ *'] = EfZ'i] 2 P [L > i ] = E[Z,] E[L], 

(>i (>i (>i 

To prove (iii), for i > 1, let Y, be the value assigned to y in loop iteration i, with 
Yj := -L if L < i, and let Hi be the event that the algorithm halts in loop iteration i 
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(i.e., H, is the event that L = /'). By a calculation similar to that made in the proof 
of Theorem 9.2, for each t e T, we have 

P[Y = n = 2 p K y = 0 n n <\ = E P[(y ' = 0 n H, | L > /] P[L > i] 

!>1 (>1 

= P[(Y i = o n h x ] 2 P[i- > 'I = P[(Vt = t) n Hi] • E[L] 

i>l 

= P[(T i = o n 7/ x ]/ P[H X ] = P[T! = t I H X ]. □ 

Example 9.10. Suppose T is a finite set, and T' is a non-empty, finite subset of T. 
Consider the following generalization of Algorithm RN: 

repeat 



until y eT' 
output y 

Here, we assume that we have an algorithm to generate a random element of T (i.e., 
uniformly distributed over T), and an efficient algorithm to test for membership in 
T' . Let L denote the number of loop iterations, and Y the output. Also, let Y\ be 
the value of y in the first iteration, and Hi the event that the algorithm halts in the 
first iteration. Since Y i is uniformly distributed over T, and H\ is the event that 
Yi e r, we have P [Hi] = \T'\/\T\. It follows that E[L] = \T\/\T'\. As for the 
output, for every t e T, we have 

P[Y = t\ = P[Ti = t\ Hi\ = P[Ti = 1 1 Yx e T'l 


which is 0 if t £T' and is 1 /\T'\ if t e T' . It follows that Y is uniformly distributed 
over T' . □ 


Example 9.11. Let us analyze the following algorithm: 
repeat 

y { 1 , 2 , 3, 4} 

z £ {l,.--,y} 

until z = 1 
output y 

With each loop iteration, the algorithm chooses y uniformly at random, and then 
decides to halt with probability 1 /y. Let L denote the number of loop iterations, 
and Y the output. Also, let Y i be the value of y in the first iteration, and Hi the 
event that the algorithm halts in the first iteration. Y \ is uniformly distributed over 
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{ 1, . . . , 4} , and for t = 1, 4, P[74i | Y\ = f] = 1/f. Therefore, 

4 4 

P[Wt] = 2 P[Hi \ Y] = t\ P[y, = r] = 2(l/0(l/4) = 25/48. 

t= 1 r=l 

Thus, E[L] = 48/25. For the output distribution, for f = 1, . . . , 4, we have 

P[T = t] = P[Ti = t\Hi] = P[(T i = t) n 74!]/ P[7/r] 

= P[H! | Y t = t] P[Ti = t\/ P[Hi] = (l/f)(l/4)(48/25) = 

This example illustrates how a probabilistic test can be used to create a biased 
output distribution. □ 

Exercise 9.6 . Design and analyze an efficient probabilistic algorithm that takes 
as input an integer n > 2, and outputs a random element of Z*. 

Exercise 9.7. Consider the following probabilistic algorithm that takes as input 
a positive integer nr. 

repeat 

n £ {l, ... ,m}, S <- S U {n} 
until |5| = m 

Show that the expected number of iterations of the main loop is ~ m log m. 
Exercise 9.8. Consider the following algorithm (which takes no input): 

j<~ 1 

repeat 

j j + 1, n <£- {0, 1} 

until n = 0 

Show that the expected running time of this algorithm is infinite (even though it 
does halt with probability 1). 

Exercise 9.9. Now consider the following modification to the algorithm in the 
previous exercise: 

j<~ 2 

repeat 

j^j + L nt 1} 

until n = 0 or n = 1 

Show that the expected running time of this algorithm is finite. 
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Exercise 9.10. Consider again Algorithm RN in §9.2. On input m, this algorithm 
may use up to m 21 random bits on average, where t := [logo m | . Indeed, each 
loop iteration generates i random bits, and the expected number of loop iterations 
will be 2 when m k, 2 l ~ l . This exercise asks you to analyze an alternative 
algorithm that uses just l + 0(1) random bits on average, which may be useful in 
settings where random bits are a scarce resource. This algorithm runs as follows: 

repeat 

y <- 0, i <r- 1 

while y < m and i < £ do 

(*) bt {0, 1}, y<- y + 2 t - i b, i<-i+l 

until y < m 
output y 

Define random variables K and Y, where K is the number of times the line marked 
(*) is executed, and Y is the output. Show that E[/C] = l + 0(1) and that Y is 
uniformly distributed over { 0, . . . , m — 1 } . 

Exercise 9.11. Let S and T be finite, non-empty sets, and let / : S x T -» 
{ - 1 , 0, 1 } be a function. Consider the following probabilistic algorithm: 

xt S,ytT 
if / (x, y) = 0 then 

y' <- y 

else 

y’ tT 

(*) while f(x, y') = 0 do / 4- T 

Here, we assume we have algorithms to generate random elements in S and T, and 
a deterministic algorithm to evaluate /. Define random variables X, Y, Y', and L, 
where X is the value assigned to x, Y is the value assigned to y, Y’ is the final value 
assigned to /, and L is the number of times that / is evaluated at the line marked 
(*)• 

(a) Show that (X, Y') has the same distribution as (X, Y). 

(b) Show that E[L] < 1. 

(c) Give an explicit example of S, T, and /, such that if the line marked (*) is 
deleted, then E[/(X, T)] > E[/(X, Y')] = 0. 
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9.4 Generating a random prime 

Suppose we arc given an integer m > 2, and want to generate a random prime 
between 2 and m. One way to proceed is simply to generate random numbers 
until we get a prime. This idea will work, assuming the existence of an efficient, 
deterministic algorithm IsPrime that determines whether or not a given integer is 
prime. We will present such an algorithm later, in Chapter 21. For the moment, 
we shall just assume we have such an algorithm, and use it as a “black box.” Let 
us assume that on inputs of bit length at most l, IsPrime runs in time at most r(£). 
Let us also assume (quite reasonably) that r(() = £2((). 

Algorithm RP. On input m, where m is an integer > 2, do the following: 

repeat 

n {2, . . . , m} 
until IsPrime(n) 
output n 

We now wish to analyze the running time and output distribution of Algo- 
rithm RP on an input m, where l := len(m). This is easily done, using the results of 
§9.3, and more specifically, by Example 9.10. The expected number of loop itera- 
tions performed by Algorithm RP is (m - I )/ jr(m), where n(m) is the number of 
primes up to m. By Chebyshev’s theorem (Theorem 5.1), n(m) = Q(m/£). It fol- 
lows that the expected number of loop iterations is 0(f). Furthermore, the expected 
running time of any one loop iteration is 0(t(£)) (the expected running time for 
generating n is 0(1), and this is where we use the assumption that t(£) = O(f)). 
It follows that the expected total running time is 0(£t(£)). As for the output, it is 
clear that it is uniformly distributed over the set of primes up to m. 


9.4.1 Using a probabilistic primality test 

In the above analysis, we assumed that IsPrime was an efficient, deterministic 
algorithm. While such an algorithm exists, there are in fact simpler and far more 
efficient primality tests that arc probabilistic. We shall discuss such an algorithm in 
detail in the next chapter. This algorithm (like several other probabilistic primality 
tests) has one-sided error, in the following sense: if the input n is prime, then 
the algorithm always outputs true', otherwise, if n is composite, the output may be 
true or false, but the probability that the output is true is at most e, where e is a 
very small number (the algorithm may be easily tuned to make e quite small, e.g., 
2 ~ 100 ). 

Let us analyze the behavior of Algorithm RP under the assumption that IsPrime 
is implemented by a probabilistic algorithm with an error probability for composite 
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inputs bounded by e, as discussed in the previous paragraph. Let f (£) be a bound 
on the expected running time of this algorithm for all inputs of bit length at most t. 
Again, we assume that f(£) = 

We use the technique developed in §9.3. Consider a fixed input m, and let 
i := len(m). Let L, Z, and N be random variables representing, respectively, the 
number of loop iterations, the total running time, and output of Algorithm RP on 
input m. Also, let Z\ be the random variable representing the running time of 
the first loop iteration, and let N\ be the random variable representing the value 
assigned to n in the first loop iteration. Let H\ be the event that the algorithm halts 
in the first loop iteration, and let C\ be the event that N \ is composite. 

Clearly, N\ is uniformly distributed over {2 m } . Also, by our assumptions 

about IsPrime, we have 

E [Z t \ = 0(f(l)), 

and moreover, for each j e { 2, . . . , m } , we have 

P\H\ | A/ 1 = j] < e if j is composite, 

and 

P[H\ | N\ = j] = 1 if j is prime. 

In particular, 

P[Hi | Ci] < £ and P[H\ \ Cj] = 1. 

It follows that 


P [Hi] = P [Hi | Ci] P[Ci] + P [Hi | Ci] P[Ci] > P[Wi | Ci] P[Cj] 

= n(m)/(m — 1). 

Therefore, 

E[L] < (m — \)/n(m) = 0(1) 

and 

E[Z] = E [L] E[Zi] = oam). 

That takes care of the running time. Now consider the output. For every 
j e {2, . . . , m }, we have 


P[N = j] = P[N l = j | H\\. 
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If j is prime, then 


P[N = j] = p[n { = j I 


p[(a/i =j)nn i] 


P[Hi] 

P['H l \N l =j]P[N l = j ] 


1 


P[Hi] 


(m-l)P[Hi] 


Thus, every prime is output with equal probability; however, the algorithm may 
also output a number that is not prime. Let us bound the probability of this 
event. One might be tempted to say that this happens with probability at most 
t: however, in drawing such a conclusion, one would be committing the fallacy of 
Example 8.13 — to correctly analyze the probability that Algorithm RP mistakenly 
outputs a composite, one must take into account the rate of incidence of the “pri- 
mality disease,” as well as the error rate of the test for this disease. Indeed, if C is 
the event that N is composite, then we have 


P[C] = P[Cj | H { \ = 


P[C\ n Hi] P[H\ I Ci\ P[Ci] 


< 


< 


P[Hj] 

£ 


P[Hi] 


P [Hi] ~ x(m)/(m- 1) 


= 0(£e). 


Another way of analyzing the output distribution of Algorithm RP is to consider 
its statistical distance A from the uniform distribution on the set of primes between 
2 and m. As we have already argued, every prime between 2 and m is equally likely 
to be output, and in particular', any fixed prime is output with probability at most 
1 /x{m). It follows from Theorem 8.31 that A = P[C] = 0(£e). 


9.4.2 Generating a random l-bit prime 

Instead of generating a random prime between 2 and m, we may instead want to 
generate a random f-bit prime, that is, a prime between 2 f_1 and 2 l — 1. Bertrand’s 
postulate (Theorem 5.8) tells us that there exist such primes for every £ > 2, 
and that in fact, there arc £2( 2 f /£) such primes. Because of this, we can modify 
Algorithm RP, so that each candidate n is chosen at random from the interval 
{ 2 /_l , . . . ,2 e — 1}, and all of the results for that algorithm carry over essentially 
without change. In particular, the expected number of trials until the algorithm 
halts is 0{£), and if a probabilistic primality test as in §9.4.1 is used, with an error 
probability of £, the probability that the output is not prime is ()(£t). 


Exercise 9.12. Suppose Algorithm RP is implemented using an imperfect ran- 
dom number generator, so that the statistical distance between the output distribu- 
tion of the random number generator and the uniform distribution on {2, . . . , m} is 
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equal to 5 (e.g.. Algorithm RN' in §9.2). Assume that 25 < 7t(m)/(m — 1). Also, 
let n denote the expected number of iterations of the main loop of Algorithm RP, 
let A denote the statistical distance between its output distribution and the uniform 
distribution on the primes up to m, and let l := len (m). 

(a) Assuming the primality test is deterministic, show that p = ()((:) and 
A = 0{5l). 

(b) Assuming the primality test is probabilistic, with one-sided error e, as in 
§9.4.1, show that p = 0{l) and A = 0((<5 + e)i). 


9.5 Generating a random non-increasing sequence 

The following algorithm will be used in the next section as a fundamental subrou- 
tine in a beautiful algorithm (Algorithm RFN) that generates random numbers in 
factored form. 

Algorithm RS. On input m , where m is an integer > 2, do the following: 
no <- m 

k<- 0 
repeat 

k <- k+ 1 
n k A { 1, . . . , n k —\ } 
until n k = 1 
output (n\,...,n k ) 

We analyze first the output distribution, and then the running time. 


9.5.1 Analysis of the output distribution 
Let A/ 1 , N 2 , ... be random variables denoting the choices of n\, n 2 , . . . (for com- 
pleteness, define A/,- := 1 if loop i is never entered). 

A particular output of the algorithm is a non-increasing sequence (J i, . . . , /),), 
where j\ > fa > • • • > //,_ i > /), = 1 . For any such sequence, we have 



h 

P[A/i = ./, ] • Yl P [ a/v = jv I (~)(N W = j w ) 

v=2 w<v 

1 1 1 

m j\ j h -\ ' 


(9.4) 


This completely describes the output distribution, in the sense that we have 
determined the probability with which each non-increasing sequence appeal's as 
an output. However, there is another way to characterize the output distribution 



296 


Probabilistic algorithms 


that is significantly more useful. For j = 2 ,... ,m, define the random variable 0 7 
to be the number of occurrences of the integer j in the output sequence. The O/s 

determine the A/,'s, and vice versa. Indeed, O m = e m , CF = e 2 if and only if the 

output of the algorithm is the sequence 



e m times e m _i times ei times 

From (9.4), we can therefore directly compute 


m . m . 

j= 2 j= 2 j 


(9.5) 


Moreover, we can write \/m as a telescoping product. 


1 m — 1 m — 2 
m m m — 1 


2 1 
3 ‘ 2 


- 1 /./■)’ 
j = 2 


and so re-write (9.5) as 


m 

j=2 


= Una - i/y). 

;=2 


Notice that for j = 2, ... ,m, 

J J na-i/j) = h 

ej> 0 


(9.6) 


and so by (a discrete version of) Theorem 8.7, the family of random variables 
{O 7 }J =0 is mutually independent, and for each j = 2 and each integer 
e 7 > 0, we have 

P[0 7 = e 7 1 = na - 1 /j). (9.7) 


In summary, we have shown: 

that the family {Oj}'j' =2 is mutually independent, where for each 
7 = 2 ,..., m. the variable Oj + 1 has a geometric distribution with 
an associated success probability of 1 - 1/ j. 


Another, perhaps more intuitive, analysis of the distribution of the O/s runs as 
follows. Conditioning on the event O m = e m , . . ., 0 7+ 1 = e 7+ i, one sees that the 
value of Oj is the number of times the value j appeal's in the sequence A/,, N i+ \, .. ., 
where i = e m + ■ ■ -+e y + i+ 1 ; moreover, in this conditional probability distribution, it 
is not too hai'd to convince oneself that A/, is uniformly distributed over { 1 , . . . , j } . 
Hence the probability that 0 7 = e 7 in this conditional probability distribution is the 
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probability of getting a run of exactly e 7 copies of the value j in an experiment in 
which we successively choose numbers between 1 and j at random, and this latter 
probability is clearly j~ e >( 1 - I /j). 


9.5.2 Analysis of the running time 

Let l := len(m), and let K be the random variable that represents the number of 
loop iterations performed by the algorithm. With the random variables CL, . . . , O m 
defined as above, we can write K = 1 + Oj- Moreover, for each j. Oj + 1 has 
a geometric distribution with associated success probability 1-1 /j, and hence 


E[O y ] = 


1 

1 - w 


- 1 = 


1 

T 1 T 


Thus, 


E [K] = 1 + 2 

j = 2 


m— 1 . 


+ 


7=1 


dy 


log m + 2, 


where we have estimated the sum by an integral (see §A5). 

Intuitively, this is roughly as we would expect, since with probability 1 /2, each 
successive «, is at most one half as large as its predecessor, and so after 0(1) steps, 
we expect to reach 1 . 

Let Z be the total running time of the algorithm. We may bound E[Z] using 
essentially the same argument that was used in the proof of Theorem 9.3. First, 
write Z = ^ ;>| Z,, where Z, is the time spent in the z'th loop iteration. Each loop 
iteration, if executed at all, runs in expected time 0(1). That is, there exists a 
constant c, such that for each i > 1 , 


Thus, 


E[Z, \ K>i\<ct and E[Z, | K < i] = 0. 


E[Z,] = E|Z, \ K>i] P[K > i] + E|Z, | K < i] P[K < i] < cl P [K > i ], 
and so 

E[Z] = ^ E[Z,] < cl 2 P IK > i 1 = cl E[/C] = 0{l 2 ). 

i> 1 i>l 

In summary, we have shown: 

the expected running time of Algorithm RS on l-bit inputs is 0(l 2 ). 


Exercise 9.13. Show that when Algorithm RS runs on input m. the expected 
number of (not necessarily distinct) primes in the output sequence is ~ log log m. 
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9.6 Generating a random factored number 

We now present an efficient algorithm that generates a random factored number. 
That is, on input m > 2, the algorithm generates a number y uniformly distributed 
over the interval { 1 , . . . , m } , but instead of the usual output format for such a num- 
ber y, the output consists of the prime factorization of y. 

As far as anyone knows, there arc no efficient algorithms for factoring large 
numbers, despite years of active research in search of such an algorithm. So our 
algorithm to generate a random factored number will not work by generating a 
random number and then factoring it. 

Our algorithm will use Algorithm RS in §9.5 as a subroutine. In addition, as 
we did in §9.4, we shall assume the existence of an efficient, deterministic primal- 
ity test IsPrime. In the analysis of the algorithm, we shall make use of Mertens’ 
theorem, which we proved in Chapter 5 (Theorem 5.13). 

Algorithm RFN. On input m, where m is an integer > 2, do the following: 

repeat 

run Algorithm RS on input m, obtaining (hi, . . . , «/<) 

(*) let {p\, . . . , p r ) be the subsequence of primes in {n\ n k) 

(**) y<-pi---Pr 
if y < m then 

x / {1 

if x < y then output (p\, , p r ) and halt 

forever 

Notes: 

(*) For i = 1, . . . , k — 1, the number n, is tested for primality using algorithm 
IsPrime. The sequence (hi , . . . , n^) may contain duplicates, and if these are 
prime, they are appeal - in (/>],..., p r ) with the same multiplicity. 

(**) We assume that the product is computed by a simple iterative procedure 
that halts as soon as the partial product exceeds m. This ensures that the 
time spent forming the product is always 0(len(m) 2 ), which simplifies the 
analysis. 

We now analyze the running time and output distribution of Algorithm RFN on 
input m, using the generate-and-test paradigm discussed in §9.3; here, the “gen- 
erate” paid consists of the first two lines of the loop body, which generates the 
sequence {p \ , . . . , p r ), while the “test” paid consists of the last four lines of the loop 
body. 

Let I := len(m). We assume that each call to IsPrime takes time at most t(£), 
and for simplicity, we assume r (i) = 

Let K i be the value of k in the first loop iteration, Z\ be the running time of 
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the first loop iteration, Y\ be the value of y in the first loop iteration, and H\ be 
the event that the algorithm halts in the first loop iteration. Also, let Z be the total 
running time of the algorithm, and let Y be the value of y in the last loop iteration 
(i.e., the number whose factorization is output). 

We begin with three preliminary calculations. 

First, let f = 1 , . . . , m be a fixed integer, and let us calculate the probability that 
Y\ = t. Suppose t = \\ p<m p e p is the prime factorization of t. Let O 2, . . . , O m be 
random variables as defined in §9.5, so that Oj represents the number of occur- 
rences of j in the output sequence of the first invocation of Algorithm RS. Then 
Y] = t if and only if O p = e p for all primes p < m, and so by the analysis in §9.5, 
we have 

Pm = t] = Y[p e {i - i/p) = 

p<m 

where 

g(m) := - 1 /p). 

p<m 

Second, we calculate P[H\]. Observe that for t = 1, . . . , m, we have 


P [Hi | >T = f] = t/m. 


and so 


P[Hi] = 2 fW I Yi = t] P[VT = t] = £ 


t g(m) 
m t 


= g(m). 


t= l t = l 

Third, let r = 1, . . . , m be a fixed integer, and let us calculate the conditional 
probability that Y\ = t given H\. We have 

Ptm = 0 n H x \ P [Hi \Y ]= t\ P[T! = t] 


P[Y l =t\H l ] = 


pm] 

(t / m)(g{m) / 1) _ 
g(m) m 


P[Hi] 


We may now easily analyze the output distribution of Algorithm RFN. By The- 
orem 9.3, for each t = 1, . . . , m, we have 

1 


P [Y = t] = P[T 1 =t\H\\ = 


m 


which shows that the output is indeed uniformly distributed over all integers in 
{ 1 , ... ,m}, represented in factored form. 

Finally, we analyze the expected running time of Algorithm RFN. It is easy to 
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see that E[Zi] = 0(E[Ki]t (£ ) + £ 2 ), and by the analysis in §9.5, we know that 
E[/Ci] = 0(1), and hence E[Zi] = 0(lr(£)). By Theorem 9.3, we have 

E[Z] = E[Z!]/P[Hi] = ElZ^MmT 1 . 

By Mertens' theorem, g(m)~ l = 0(1). We conclude that 

E[Z] = 0(fr(£)). 

That is, the expected running time of Algorithm RFN is 0(1 2 t(£)). 


9.6.1 Using a probabilistic primality test (*) 

Analogous to the discussion in §9.4.1, we can analyze the behavior of Algo- 
rithm RFN under the assumption that IsPrime is a probabilistic algorithm which 
may erroneously indicate that a composite number is prime with probability at most 
e. Let l := len(m), and as we did in §9.4.1, let f(£) be a bound on the expected 
running time of IsPrime for all inputs of bit length at most l (and again, assume 

m = am 

The random variables K\.Z\.Y\.Z.Y and the event Hi are defined as above. Let 
us also define T\ to be the event that the primality test makes a mistake in the first 
loop iteration, and F to be the event that the output of the algorithm is not a list of 
primes. Let 5 := P[Fi], 

Again, we begin with three preliminary calculations. 

First, let r = 1, . . . ,m be fixed and let us calculate P[(Ti = t) n T{\. To do 
this, define the random variable T' to be the product of the actual primes among 
the output of the first invocation of Algorithm RS (because the primality test may 
err, Y \ may contain additional factors). Evidently, the events (Y\ = t) n T\ and 
(Y j = t) n P \ are the same. Moreover, we claim that the events Y ' J = t and F\ arc 
independent. To see this, recall that the family { Oj }'" =2 is mutually independent, 
and also observe that the event Y' x = t depends only on the random variables Oj, 
where j is prime, while the event F i depends only on the random variables Oj, 
where j is composite, along with the execution paths of IsPrime on corresponding 
inputs. Thus, by a calculation analogous to one we made above, 

P[(VT = o n Fi] = P[Y[ = t ] P[Fj] = ^(1 - 8). 

Second, we calculate P[H\ D F i]. Observe that for t = 1, . . . , m, we have 
P[Hi | (Ti = t) n Fj] = t/m. 
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and so 


P[Hi n Ti\ = 2 p [Wi n (Yx = t) n F x ] 

r=l 

m 

= Yj I = 0 n T l] p [( y i = 0 n f{\ 


t= 1 
m 


=z 


r=l 


t g(m) 
m t 


{\-5) = g{m){ 1 - 5). 


Third, let t = 1, . . . , m be a fixed integer, and let us calculate the conditional 
probability that (Y] = t) n T\ given H\. We have 


P[(Yi = t) n T x I Hx\ = 


P[(T! =t)nF l nH l ] 

P[Hi] 

= P[Hi KTi = QnF|]P[(r 1 = 0 n F t ] 

P[Ht] 

_ (t/m)((l - S)g(m)/t) _ g(m){ 1 - S) 

P [Ht] “ fflPIHi] ‘ 

We may now easily analyze the output distribution of Algorithm RFN. By The- 
orem 9.3, for each t = 1, . . . , m, we have 

p [(Y = o n T] = P[(T! = 0 n Fj I Hi] = g(/w ?. ( r 1 ~ g) . 

Thus, every integer between 1 and m is equally likely to be output by Algo- 
rithm RFN in correct factored form. 

Let us also calculate an upper bound on the probability P[F] that Algorithm RFN 
outputs an integer that is not in correct factored form. Making use of Exercise 8.1, 
we have 

P[Fi] 


P[Fi n Hi] 

P[Fi Hi] = < 

1 P[Hi] ~ 


P[FiU Hi]' 


Moreover, 


P [Ti U Hi] = P[F ,] + P [Hi nFi] = 8 + g(m)( 1 - 6) 
> g(m)S + g(m)( 1 - 8) = g(m). 


By Theorem 9.3, it follows that 

P[F] = P[Fj | Hi] < 8/ g{m). 
Now, the reader may verify that 


8 < e ■ (Ef/Cj] - 1), 
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and by our calculations in §9.5, E[Kj] < log m + 2. Thus, 

5 < e ■ (log m + 1), 

and so by Mertens’ theorem, 

P[F] = 0(fe). 

We may also analyze the statistical distance A between the output distribution 
of Algorithm RFN and the uniform distribution on { 1 , ,m} (in factored form). 
It follows from Theorem 8.31 that A = P[F] < 5/g(m) = 0(i 2 e). 

Finally, we analyze the expected running time of Algorithm RFN. We have 

We leave it to the reader to verify that E[Zi] = from which it follows by 

Theorem 9.3 that 

E[Z] = E[Zi]/ P [Hi] = 0(£ 2 m/(l - 5)). 

If £ is moderately small, so that £(log m + 1) < 1 /2, and hence 5 < 1/2, then 

E[Z] = oa 2 m). 


9.7 Some complexity theory 

We close this chapter with a few observations about probabilistic algorithms from 
a more “complexity theoretic” point of view. 

Suppose / is a function mapping bit strings to bit strings. We may have an 
algorithm A that approximately computes / in the following sense: there exists 
a constant £, with 0 < e < 1 /2, such that for all inputs x, A(x) outputs / (x) with 
probability at least 1 - e. The value £ is a bound on the error probability, which 
is defined as the probability that A{x) does not output / (x). 


9. 7.1 Reducing the error probability 

There is a standard “trick” by which one can make the error probability very small; 
namely, run A on input x some number, say k, times, and take the majority output 
as the answer. Suppose e < 1/2 is a bound on the error probability. Using the 
Chernoff bound (Theorem 8.24), the error probability for the iterated version of A 
is bounded by 

exp[— (1/2 - efk/2], (9.8) 

and so the error probability decreases exponentially with the number of iterations. 
This bound is derived as follows. For i = 1 ,k, let X, be the indicator variable 



9.7 Some complexity theory 


303 


for the event that the ith iteration of A(x) does not output f (x). The expected value 
of the sample mean X := | X/=i * s at most A and if the majority output of the 
iterated algorithm is wrong (or indeed, if there is no majority), then X exceeds its 
expectation by at least 1/2 — a. The bound (9.8) follows immediately from paid (i) 
of Theorem 8.24. 


9. 7.2 Strict polynomial time 

If we have an algorithm A that runs in expected polynomial time, and which 
approximately computes a function /, then we can easily turn it into a new algo- 
rithm A' that runs in strict polynomial time, and also approximates /, as follows. 
Suppose that e < 1/2 is a bound on the error probability, and Q(n) is a polynomial 
bound on the expected running time for inputs of size n. Then A' simply runs A for 
at most kQ(n ) steps, where k is any constant chosen so that e + l/k < 1 /2 — if A 
does not halt within this time bound, then A 1 simply halts with an arbitrary output. 
The probability that A' errs is at most the probability that A errs plus the probability 
that A runs for more than kQ{n ) steps. By Markov’s inequality (Theorem 8.22), 
the latter probability is at most 1 /k, and hence A! approximates / as well, but with 
an error probability bounded by e + l/k. 


9. 7.3 Language recognition 

An important special case of approximately computing a function is when the out- 
put of the function / is either 0 or 1 (or equivalently, false or true). In this case, / 
may be viewed as the characteristic function of the language L := {x : fix) = 1}. 
(It is the tradition of computational complexity theory to call sets of bit strings 
“languages.”) There arc several “flavors” of probabilistic algorithms for approxi- 
mately computing the characteristic function / of a language L that arc tradition- 
ally considered — for the purposes of these definitions, we may restrict ourselves 
to algorithms that output either 0 or 1 : 

• We call a probabilistic, expected polynomial-time algorithm an Atlantic 
City algorithm for recognizing L if it approximately computes / with 
error probability bounded by a constant e < 1/2. 

• We call a probabilistic, expected polynomial-time algorithm A a Monte 
Carlo algorithm for recognizing L if for some constant <5 > 0, we have: 

- P[A(x) outputs 1] > 5 for all rel; 

- P[A(x) outputs 1] = 0 for all x / L. 

• We call a probabilistic, expected polynomial-time algorithm a Las Vegas 
algorithm for recognizing L if it computes / correctly on all inputs x. 
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One also says an Atlantic City algorithm has two-sided error, a Monte Carlo 
algorithm has one-sided error, and a Las Vegas algorithm has zero-sided error. 


Exercise 9.14. Show that every language recognized by a Las Vegas algorithm 
is also recognized by a Monte Carlo algorithm, and that every language recognized 
by a Monte Carlo algorithm is also recognized by an Atlantic City algorithm. 

Exercise 9.15. Show that if L is recognized by an Atlantic City algorithm that 
runs in expected polynomial time, then it is recognized by an Atlantic City algo- 
rithm that runs in strict polynomial time, and whose error probability is at most 2~" 
on inputs of size n. 

Exercise 9.16. Show that if L is recognized by a Monte Carlo algorithm that 
runs in expected polynomial time, then it is recognized by a Monte Carlo algorithm 
that runs in strict polynomial time, and whose error probability is at most 2~ n on 
inputs of size n. 

Exercise 9.17. Show that a language is recognized by a Las Vegas algorithm 
if and only if the language and its complement arc recognized by Monte Carlo 
algorithms. 

Exercise 9.18. Show that if L is recognized by a Las Vegas algorithm that runs 
in strict polynomial time, then L may be recognized in deterministic polynomial 
time. 

Exercise 9.19. Suppose that for a given language L, there exists a probabilistic 
algorithm A that runs in expected polynomial time, and always outputs either 0 or 
1. Further suppose that for some constants a and c, where 

• a is a rational number with 0 < a < 1, and 

• c is a positive integer, 

and for all sufficiently large «, and all inputs x of size n, we have 

• if x $ L, then P[A(x) outputs 1] < a, and 

• if x G L, then P[A(x) outputs 1] > a + 1 /n c . 

(a) Show that there exists an Atlantic City algorithm for L. 

(b) Show that if a = 0, then there exists a Monte Carlo algorithm for L. 


9.8 Notes 

Our approach in §9.1 to defining the probability distribution associated with the 
execution of a probabilistic algorithm is not the only possible one. For example. 
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one could define the output distribution and expected running time of an algorithm 
on a given input directly, using the identities in Exercise 9.2, and avoid the con- 
struction of an underlying probability distribution altogether; however, we would 
then have very few tools at our disposal to analyze the behavior of an algorithm. 
Yet another approach is to define a distribution that models an infinite random bit 
string. This can be done, but requires more advanced notions from probability 
theory than those that have been covered in this text. 

The algorithm presented in §9.6 for generating a random factored number is due 
to Kalai [52], although the analysis presented here is a bit different, and our anal- 
ysis using a probabilistic primality test is new. Kalai’s algorithm is significantly 
simpler, though less efficient, than an earlier algorithm due to Bach [9], which uses 
an expected number of 0(1) primality tests, as opposed to the 0(£ 2 ) primality tests 
used by Kalai’s algorithm. 

See Luby [63] for an exposition of the theory of pseudo-random bit generation. 
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In this chapter, we discuss some simple and efficient probabilistic algorithms for 
testing whether a given integer is prime. 


10.1 Trial division 

Suppose we arc given an integer n > 1, and we want to determine whether n is 
prime or composite. The simplest algorithm to describe and to program is trial 
division. We simply divide n by 2, 3, and so on, testing if any of these numbers 
evenly divide n. Of course, we only need to divide by primes up to \fn, since if 
n is composite, it must have a prime factor no greater than \fn (see Exercise 1.2). 
Not only does this algorithm determine whether n is prime or composite, it also 
produces a non-trivial factor of n in case n is composite. 

Of course, the drawback of this algorithm is that it is terribly inefficient: it 
requires 0(V«) arithmetic operations, which is exponential in the bit length of n. 
Thus, for practical purposes, this algorithm is limited to quite small n. Suppose, for 
example, that n has 100 decimal digits, and that a computer can perform 1 billion 
divisions per second (this is much faster than any computer existing today). Then 
it would take on the order of 10 33 years to perform \fn divisions. 

In this chapter, we discuss a much faster primality test that allows 100-decimal- 
digit numbers to be tested for primality in less than a second. Unlike the above 
test, however, this test does not find a factor of n when n is composite. More- 
over, the algorithm is probabilistic, and may in fact make a mistake. However, the 
probability that it makes a mistake can be made so small as to be irrelevant for all 
practical purposes. Indeed, we can easily make the probability of error as small as 
2 -ioo — should one really care about an event that happens with such a miniscule 
probability? 
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10.2 The Miller-Rabin test 

We describe in this section a fast (polynomial time) test for primality, known as 
the Miller-Rabin test. As discussed above, the algorithm is probabilistic, and 
may (with small probability) make a mistake. 

We assume for the remainder of this section that the number n we are testing for 
primality is an odd integer greater than 1. 

We recall some basic algebraic facts that will play a critical role in this section 
(see §7.5). Suppose n = p^ 1 ■ ■ ■ pf is the prime factorization of n (since n is odd, 
each pi is odd). The Chinese remainder theorem gives us a ring isomorphism 

6 : Z„ — » Z/i x • • • x Z/r 

" P j Pr 

[a] n (M^i , • • • , [a\ p e r r ), 

and restricting 6 to Z* yields a group isomorphism 

Z* £Z* x ••• xZV 
" Pi Pr 

Moreover, Theorem 7.28 says that each Z* c , is a cyclic group, whose order, of 

course, is (pipf) = p]' \pt - 1), where cp is Euler’s phi function. 

Several probabilistic primality tests, including the Miller-Rabin test, have the 
following general structure. Define Tf to be the set of non-zero elements of Z„; 
thus, \Tf\ = n — 1, and if n is prime, Z+ = Z*. Suppose also that we define a set 
L n C such that: 

• there is an efficient algorithm that on input n and a e Z,t, determines if 
a e L n ; 

• if n is prime, then L n = Z* ; 

• if n is composite, \L n \ < c{n — 1) for some constant c < 1. 

To test n for primality, we set a “repetition parameter” k, and choose random 
elements a\,...,ak £ Z If a, e L n for all / = 1, . . . , k, then we output true; 
otherwise, we output false. 

It is easy to see that if n is prime, this algorithm always outputs true, and if n is 
composite this algorithm outputs true with probability at most c k . If c = 1/2 and k 
is chosen large enough, say k = 100, then the probability that the output is wrong 
is so small that for all practical purposes, it is “just as good as zero.” 

We now make a first attempt at defining a suitable set L n . Let us define 
L n := {a e Z+ : a n ~ l = 1}. 

Note that L n C Z*, since if a" -1 = 1, then a has a multiplicative inverse, namely. 
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a n ~ 2 . We can test if a e L n in time 0(len(n) 3 ), using a repeated-squaring algo- 
rithm. 

Theorem 10 . 1 . If n is prime, then L n = Z*. If n is composite and L„ C Z*, then 
\L„\ <{n— l)/2. 

Proof. Note that L n is the kernel of the (n - l)-power map on Z*, and hence is a 
subgroup ofZ*. 

If n is prime, then we know that Z* is a group of order n — 1. Since the order of 
a group element divides the order of the group, we have or" -1 = 1 for all a e Z*. 
That is, L n = Z*. 

Suppose that n is composite and L n C Z*. Since the order of a subgroup divides 
the order of the group, we have |Z*| = t\L n \ for some integer t > 1. From this, we 
conclude that 

\L n \ = y|Z*| < ^|Z*| < — . □ 

Unfortunately, there are odd composite numbers n such that L n = Z*. Such 
numbers are called Carmichael numbers. The smallest Carmichael number is 

561 = 3 • 11 • 17. 

Carmichael numbers are extremely rare, but it is known that there arc infinitely 
many of them, so we cannot ignore them. The following theorem puts some con- 
straints on Carmichael numbers. 

Theorem 10.2. Every Carmichael number n is of the form n = p\ ■ ■ ■ p r , where 
the pi ’s are distinct primes, r > 3, and ( p, — 1) | (n — 1) for i = 1, . . . , r. 

Proof. Let n = p\' ■ ■ ■ pf be a Carmichael number. By the Chinese remainder 
theorem, we have an isomorphism of Z* with the group 

Z* ei x ••• xZ* er , 

Pl Pr 

and we know that each group Z* e , is cyclic of order p e . 1 (p, - 1). Thus, the power 
n — 1 kills the group Z* if and only if it kills all the groups Z* e , , which happens if 

and only if p e - 1 ( p t - 1) | (n - 1). Now, on the one hand, n = 0 (mod pf). On the 
other hand, if e, > 1, we would have n = I (mod p,), which is clearly impossible. 
Thus, we must have e, = 1. 

It remains to show that r > 3. Suppose r = 2, so that n = p\pi. We have 
n - 1 = P 1 P 2 ~ 1 = (Pi ~ 1)P2 + ( P 2 ~ !)• 

Since (pi - 1) | (« - 1), we must have (p\ — 1) | ( pi — 1). By a symmetric argument, 
{P 2 ~ 1) I ( Pi - !)• Hence, p\ = pi, a contradiction. □ 
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To obtain a good primality test, we need to define a different set L' n , which we 
do as follows. Let n — 1 = t2 h , where t is odd (and h > 1 since n is assumed odd), 
and define 

L' n := {a e : a 12 '’ = 1 and 

a t2J+1 = 1 => a t2J = ±1 for; = 0 h- 1). 

The Miller-Rabin test uses this set L' n , in place of the set L n defined above. It is 
clear from the definition that L' n C L n . 

Testing whether a given belongs to L' n can be done using the following 

procedure: 

p <r- a 1 

if p = 1 then return true 
for j <— 0 to /; — I do 

if p = - 1 then return true 
if'/? = + 1 then return false 

p^p 2 
return false 

It is clear that using a repeated-squaring algorithm, this procedure runs in time 
0(len(«) 3 ). We leave it to the reader to verify that this procedure correctly deter- 
mines membership in L' n . 

Theorem 10.3. If nisprime, then L' n = Z*. If n is composite, then \ L' n \ < (n— I )/4. 
Proof. Let n — 1 = t2 h , where t is odd. 

Case 1 : n is prime. Let a e Z*. Since Z* is a group of order n — 1, and the order 
of a group element divides the order of the group, we know that a t2 ' = a"~ l = 1. 

Now consider any index j = 0 , h— 1 such that a’ 2 ' = 1, and consider the value 

P := a ,2J . Then since p 2 = a ,2 ' + ' = 1, the only possible choices for /? are ±1 — this 
is because Z* is cyclic of even order and so there are exactly two elements of Z* 
whose multiplicative order divides 2, namely ±1. So we have shown that a e L’ n . 

Case 2: n = p e , where p is prime and e > 1. Certainly, U n is contained in the 
kernel K of the {n— l)-power map on Z*. By Theorem 6.32, = ged (<p(n), n— 1). 

Since n = p e , we have cp(n) = p e ~ 1 ( p — 1), and so 

Q 1 1 

\K\ < \K\ = ged (p e ~\p- 1),/ - 1) = p- 1 = — p < . 

p e - 1 hi 4 

Case 3: n = p\ { ■ ■ ■ p e r r is the prime factorization of n, and r > 1. Let 

6 : Z„ — > Z «i x • • • x Z'f 

" Pi Pr 

be the ring isomorphism provided by the Chinese remainder theorem. Also, let 
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(pip* 1 ) = tj2 h ‘, with tj odd, for i = 1, . . . , r, and let g := min{/z, h\,...,h r }. Note 
that g > 1, and that each Z% is a cyclic group of order tj2 hl . 

We first claim that for every a e L' n , we have ex' 2 " = 1. To prove this, first 
note that if g = h, then by definition, a' 2 * = 1, so suppose that g < h. By 
way of contradiction, suppose that a' 2 1, and let j be the smallest index in 

the range g,...,h — 1 such that a ,2 ' l+ ' = 1. By the definition of L ’ n , we must 
have (x l - = -1. Since g < h, we must have g = /i, for some particulai' index 
i = 1, . . . , r. Writing 9(a) = (ai, . . . , a r ), we have ar = -1. This implies that 
the multiplicative order of a\ is equal to 2 7+l (see Theorem 6.37). However, since 
j > g = hi, this contradicts the fact that the order of a group element (in this case, 
a\) must divide the order of the group (in this case, Z*,, ). 

For j = 0, . . . , h, let us define pj to be the (f2 7 )-power map on Z*. From the claim 
in the previous paragraph, and the definition of L' n , it follows that each a e L' n 
satisfies a t2S = ±1. In other words, L' n C ( ( j ± 1 } ), and hence 

\L' n \ < 2|Ker p g -\\. (10.1) 

From the group isomorphism Z* = Z* C1 x • • • x ZZ,. , and Theorem 6.32, we have 

r 

|Ker pj\ = gcd(f,-2 Ai , t2 J ) (10.2) 

;=i 

for each j = 0 ,h. Since g < h, and g < h , for i = l,...,r, it follows 

immediately from (10.2) that 

2 , '|Kerp g _i| = |Kerp g | < |Kerp/,|. (10.3) 

Combining (10.3) with (10.1), we obtain 

\L’ n \ <2- r+1 |Kerp,,|. (10.4) 

If r > 3, then (10.4) directly implies that \L’ n \ < |Z*|/4 < (n — l)/4, and we 
are done. So suppose that r = 2. In this case, Theorem 10.2 implies that n is not 
a Carmichael number, which implies that |Kerp/,| < |Z*|/2, and so again, (10.4) 
implies \L' n \ < |Z*|/4 < (n - l)/4. □ 

Exercise 10.1. Show that an integer n > 1 is prime if and only if there exists an 
element in Z* of multiplicative order n — I . 

Exercise 10.2. Show that Carmichael numbers satisfy Fermat’s little theorem; 
that is, if n is a Carmichael number, then or" = a for all a e Z„. 



10.3 Generating random primes using the Miller-Rabin test 


311 


Exercise 10.3. Let p be a prime. Show that n := 2p + 1 is a prime if and only if 
2"" 1 = 1 (mod n). 

Exercise 10.4. Here is another primality test that takes as input an odd integer 
n > 1, and a positive integer parameter k. The algorithm chooses tx\, ... e Z+ 
at random, and computes 

Pi := a . (/ = 1 

If (Pi, . . . , Pk) is of the form (±1, ±1, . . . , ±1), but is not equal to (1,1,..., 1), the 
algorithm outputs true', otherwise, the algorithm outputs false. Show that if n is 
prime, then the algorithm outputs false with probability at most 2“ / \ and if n is 
composite, the algorithm outputs true with probability at most 2~ k . 

In the terminology of §9.7, the algorithm in the above exercise is an example of 
an “Atlantic City” algorithm for the language of prime numbers (or equivalently, 
the language of composite numbers), while the Miller-Rabin test is an example of 
a “Monte Carlo” algorithm for the language of composite numbers. 


10.3 Generating random primes using the Miller-Rabin test 

The Miller-Rabin test is the most practical algorithm known for testing primality, 
and because of this, it is widely used in many applications, especially cryptographic 
applications where one needs to generate large, random primes (as we saw in §4.7). 
In this section, we discuss how one uses the Miller-Rabin test in several practically 
relevant scenarios where one must generate large primes. 


10.3.1 Generating a random prime between 2 and m 

Suppose we are given an integer m > 2, and want to generate a random prime 
between 2 and m. We can do this by simply picking numbers at random until one 
of them passes a primality test. We discussed this problem in some detail in §9.4, 
where we assumed that we had a primality test IsPrime. The reader should review 
§9.4, and §9.4.1 in particular. In this section, we discuss aspects of this problem 
that are specific to the situation where the Miller-Rabin test is used to implement 
IsPrime. To be more precise, let us define the following algorithm: 

Algorithm MR. On input n, k, where n and k are integers with n > 1 and k > 1, 
do the following: 
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if n = 2 then return true 
if n is even then return false 

repeat k times 
a t 1+ 

if a ^ L' n return false 
return true 

So we shall implement IsPrime(-) as MR(-, It), where k is an auxiliary parame- 
ter. By Theorem 10.3, if n is prime, the output of MR(n, k) is always true, while 
if n is composite, the output is true with probability at most 4~ k . Thus, this imple- 
mentation of IsPrime satisfies the assumptions in §9.4.1, with e = 4~ /l . 

Let y(m, k) be the probability that the output of Algorithm RP in §9.4 — using 
this implementation of IsPrime — is composite. Then as we discussed in §9.4.1, 

y(m, k ) < 4~ k • ^ = Of 4~ k t), (10.5) 

7t(m) 

where l := len(m), and trim) is the number of primes up to m. Furthermore, if 
the output of Algorithm RP is prime, then every prime is equally likely; that is, the 
conditional distribution of the output, given that the output is prime, is (essentially) 
the uniform distribution on the set of primes up to m. 

Let us now consider the expected running time of Algorithm RP. As discussed 
in §9.4.1, the expected number of iterations of the main loop in Algorithm RP is 
0{t). Clearly, the expected running time of a single loop iteration is 0{kl i ), since 
MR(n, It) executes at most k iterations of the Miller-Rabin test, and each such 
test takes time 0{1 3 ). This leads to a bound on the expected total running time 
of Algorithm RP of ()(k( 4 ). However, this estimate is overly pessimistic, because 
when n is composite, we expect to perform very few Miller-Rabin tests — only 
when n is prime do we actually perform all k of them. 

To make a rigorous argument, let us define random variables measuring various 
quantities during the first iteration of the main loop in Algorithm RP: N\ is the 
value of tv, K\ is the number of Miller-Rabin tests actually performed; Z\ is the 

running time. Of course, N\ is uniformly distributed over {2 m } . Let Ci be 

the event that N\ is composite. Consider the conditional distribution of K\ given 
Ci. This is not exactly a geometric distribution, since K\ never takes on values 
greater than k\ nevertheless, using Theorem 8.17, we can easily calculate 

E[Ki | Ci] = 2 P[Ki > i I Ci] < 2 (l / 4 r' = 4/3. 

i> 1 i> 1 



10.3 Generating random primes using the Miller-Rabin test 


313 


Using the law of total expectation (8.24), it follows that 

mi] = mi I Cii p[Ci] + e[k, i Cii p[Ci] 

< 4/3 + kn{m)/{m — 1). 

Thus, E[Ki] < 4/3 + 0(k/£), and hence E[Zi] = 0(£ 3 E[/C,]) = 0(£ 3 + Id 2 ). 
Therefore, if Z is the total running time of Algorithm RP, then E[Z] = ()((. E[Zi]), 
and so 

E [Z] = 0(£ 4 + Id 3 ). (10.6) 

Note that the above estimate (10.5) for y(m, k) is actually quite pessimistic. This 
is because the error probability 4~ k is a worst-case estimate; in fact, for “most” 
composite integers n. the probability that MR(n, k) outputs true is much smaller 
than this. In fact, y(m, 1) is very small for large m. For example, the following is 
known: 

Theorem 10.4. We have 

y(m, 1) < exp[— (1 + o( 1 )) log(m) log(log(log(m)))/ logdog(m)) |. 

Proof. Literature — see §10.5. □ 

The bound in the above theorem goes to zero quite quickly: faster than (log m)~ c 
for every positive constant c. While the above theorem is asymptotically very good, 
in practice, one needs explicit bounds. For example, the following lower bounds 
for -log 2 (y(2 £ , 1)) are known: 


£ 

200 

300 

400 

500 

600 


3 

19 

37 

55 

74 


Given an upper bound on y{m, 1), we can bound y(m, k) for k > 2 using the 
following inequality: 

Y(m,k)< 4-* +1 . (10.7) 

1 - y(m, 1 ) 

To prove (10.7), it is not hard to see that on input m. the output distribution of 
Algorithm RP is the same as that of the following algorithm: 

repeat 

repeat 

n' { 2 ,... ,m] 

until MR (n', 1) 
n <r- n' 

until MR(n, k — 1) 
output n 
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Let N i be the random variable representing the value of n in the first iteration of 
the main loop in this algorithm, let C\ be the event that N\ is composite, and let 
hl\ be the event that this algorithm halts at the end of the first iteration of the main 
loop. Using Theorem 9.3, we see that 


Y(m,k) = P[C] | T~l\\ = 


P[Ci n Hi] 
P[Wi] 


< 


P[Ci n 7/!] 
P[Ci] 


< 4 -* +1 y(m, 1) 
1 - y(m , 1) 


P[Hi | Ci] P[C t ] 
P[Ci] 


which proves (10.7). 


Given that y(m , 1) is so small, for large m. Algorithm RP actually exhibits the 
following behavior in practice: it generates a random value n e {2, . . . , m}\ if n 
is odd and composite, then the very first iteration of the Miller-Rabin test will 
detect this with overwhelming probability, and no more iterations of the test arc 
performed on this n; otherwise, if n is prime, the algorithm will perform k — I 
more iterations of the Miller-Rabin test, “just to make sure.” 


Exercise 10.5. Consider the problem of generating a random Sophie Germain 
prime between 2 and m (see §5.5.5). One algorithm to do this is as follows: 

repeat 

n ■£- {2, ... ,m] 
if MR(n, k ) then 

if MR(2n + 1, k) then 
output n and halt 

forever 

Assuming Conjecture 5.24, show that this algorithm runs in expected time 
0(t 5 + kl 4 ), and outputs a number that is not a Sophie Germain prime with prob- 
ability 0( 4~ k £ 2 ). As usual, l := len(nj). 

Exercise 10.6. Improve the algorithm in the previous exercise, so that under the 
same assumptions, it runs in expected time 0{£ 5 + Id'. 1 ’), and outputs a number that 
is not a Sophie Germain prime with probability 0(4~ k l 2 ), or even better, show 
that this probability is at most y(m, k)n*{m)/ n{m) = 0(y(m, k)l), where n*{m) is 
defined as in §5.5.5. 

Exercise 10.7. Suppose in Algorithm RFN in §9.6 we implement algorithm 
IsPrime (•) as MR(-.I), where k is a parameter satisfying 4~ /l (log m + 1) < 1/2, 
and m is the input to RFN. Show that the expected running time of Algorithm RFN 
in this case is 0(£ 5 + k£ 4 len(f )). Hint: use Exercise 9.13. 
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10.3.2 Trial division up to a small bound 
In generating a random prime, most candidates will in fact be composite, and so it 
makes sense to cast these out as quickly as possible. Significant efficiency gains can 
be achieved by testing if a given candidate n is divisible by any prime up to a given 
bound s, before we subject n to a Miller-Rabin test. This strategy makes sense, 
since for a small, “single precision” prime p , we can test U p | n essentially in time 
0(len(«)), while a single iteration of the Miller-Rabin test takes time 0(len(«) 3 ). 

To be more precise, let us define the following algorithm: 

Algorithm MRS. On input n, k , s, where n,k,se Z, and n > 1, k > 1, and s > 1, 
do the following: 

for each prime p < s do 
if p | n then 

if/) = n then return true else return false 
repeat k times 
a t 1+ 

if a £ L' n return false 
return true 

In an implementation of the above algorithm, one would most likely use the 
sieve of Eratosthenes (see §5.4) to generate the small primes. 

Note that MRS(n, k,2) is equivalent to MR(«, k). Also, it is clear that the 
probability that MRS(n, k, s) makes a mistake is no more than the probability that 
MR(n, /< ) makes a mistake. Therefore, using MRS in place of MR will not increase 
the probability that the output of Algorithm RP is a composite — indeed, it is likely 
that this probability decreases significantly. 

Let us now analyze the impact on the running time Algorithm RP. To do this, we 
need to estimate the probability trim, s) that a randomly chosen integer between 2 
and m is not divisible by any prime up to 5. If m is sufficiently large with respect to 
s. the following heuristic argument can be made rigorous, as we will discuss below. 
The probability that a random integer is divisible by a prime p is about 1/p, so the 
probability that it is not divisible by p is about 1 — 1 / p. Assuming that these events 
are essentially independent for different values of p (this is the heuristic part), we 
estimate 

<r (m, s) & n<> - 1 /P). (10.8) 

p<s 

Assuming for the time being that the approximation in (10.8) is sufficiently accu- 
rate, then using Mertens' theorem (Theorem 5.13), we may deduce that 


ct(m, s) = 0(1 / log s). 


(10.9) 
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Later, when we make this argument more rigorous, we shall see that (10.9) holds 
provided s is not too large relative to m, and in particular, if ,v = 0((log m) c ) for 
some constant c. 

The estimate (10.9) gives us a bound on the probability that a random integer 
passes the trial division phase, and so must be subjected to Miller-Rabin; how- 
ever, performing the trial division takes some time, so we also need to estimate the 
expected number K(m, s ) of trial divisions performed on a random integer between 
2 and m. Of course, in the worst case, we divide by all primes up to s, and so 
K(m,s ) < x(s) = 0(s/logs), but we can get a better bound, as follows. Let 
pi, P 2 , . . . , p r be the primes up to s, and for i = 1 , . . . , r, let q, be the probability 
that we perform at least i trial divisions. By Theorem 8.17, we have 

r 

K(m , s) = ^ qi- 

i=l 

Moreover, <71 = 1, and q, = tr(m, p,-\) for i = 2, . . . , r. From this, and (10.9), it 
follows that 

r 

K(m,s ) = 1 + ^ o(m,pj-\) = !/ l°gp)- 

i=2 p<s 

As a simple consequence of Chebyshev’s theorem (in particular, see Exercise 5.3), 
we obtain 

K(m, s) = 0(s/(log s) 2 ). (10.10) 

We now derive a bound on the running time of Algorithm RP, assuming that 
IsPrime(-) is implemented using MRS (•, k, 5). Let i := len(m). Our argument 
follows the same lines as was used to derive the estimate (10.6). Let us define 
random variables measuring various quantities during the, first iteration of the main 
loop in Algorithm RP: N\ is the value of «; K\ is the number of Miller-Rabin tests 
actually performed; Z\ is the running time. Also, let Ci be the event that H\ is 
composite, and let T)\ be the event that N \ passes the trial division check. Then we 
have 

E[/Ci] = E[Kj I C! n Dr] P [Cj n T)\ \ + E[/C, | Cj n Dj] P[Ci n Dr] 

+ E[Kj | Ci] P[Cj] 

< 4/3 • P[Ci n Di] + 0 • P[Ci n Z5i] + k ■ P[Ci] 

<4/3-P[D l ] + k-P[Ci]. 

By (10.9) and Chebyshev’s theorem, it follows that 


E[Ki] = 0(l/len(5) + k/i). 


( 10 . 11 ) 
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Let us write Z\ — Z' + Z", where Z' is the amount of time spent performing 
the Miller-Rabin test, and Z" is the amount of time spent performing trial divi- 
sion. By (10.11), we have E[Z'] = 0(( 3 /len(s) + AT 2 ). Further, assuming 
that each individual trial division step takes time 0(1), then by (10.10) we have 
E[Z"] = 0(ts/ len(s) 2 ). Hence, 

E[Z i ] = 0(( 3 / len(s) + k£ 2 + £s/ len(s) 2 ). 

It follows that if Z is the total running time of Algorithm RP, then 

E[Z] = 0(£ 4 / len(s) + AT 3 + i 2 s/ len(s) 2 ). 

Clearly, we want to choose the parameter s so that the time spent performing trial 
division is dominated by the time spent performing the Miller-Rabin test. To this 
end, let us assume that £ < s < £ 2 . Then we have 

E[Z] = 0(£ 4 / len(() + AT 3 ). ( 10.12) 

This estimate does not take into account the time to generate the small primes 
using the sieve of Eratosthenes. These values might be pre-computed, in which 
case this time is zero, but even if we compute them on the fly, this takes time 
0(slen(len(.s))), which is dominated by the running time of the rest of the algo- 
rithm for the values of s under consideration. 

Thus, by sieving up to a bound s, where £ < s < £ 2 , then compared to (10.6), 
we effectively reduce the running time by a factor proportional to lcn((), which is 
a very real and noticeable improvement in practice. 

As we already mentioned, the above analysis is heuristic, but the results are 
correct. We shall now discuss how this analysis can be made rigorous; however, 
we should remark that any such rigorous analysis is mainly of theoretical interest 
only — in any practical implementation, the optimal choice of the parameter s is 
best determined by experiment, with the analysis being used only as a rough guide. 
Now, to make the analysis rigorous, we need prove that the estimate (10.8) is suf- 
ficiently accurate. Proving such estimates takes us into the realm of “sieve theory.” 
The larger m is with respect to s, the easier it is to prove such estimates. We shall 
prove only the simplest and most naive such estimate, but it is still good enough 
for our purposes. 

Before stating any results, let us restate the problem slightly differently. For a 
given real number y > 0, let us call a positive integer “y-rough” if it is not divisible 
by any prime p up to y. For all real numbers x > 0 and y > 0, let us define R(x, y) 
to be the number of y-rough positive integers up to x. Thus, since <r(m, s) is the 
probability that a random integer between 2 and m is .s- rough. and 1 is by definition 
s-rough, we have <j(m,s) = ( R(m , s) — 1 )/(m — 1). 
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Theorem 10.5. For all real x > 0 and y > 0, we have 

p<y 

Proof. To simplify the notation, we shall use the Mobius function p (see §2.9). 
Also, for a real number u. let us write u = [wj + {«}, where 0 < {«} < 1. Let Q 
be the product of the primes up to the bound y. 

Now, there are [xj positive integers up to x, and of these, for each prime p divid- 
ing Q , precisely [x/p\ are divisible by p, for each pair p, p' of distinct primes divid- 
ing Q , precisely [x/pp'\ arc divisible by pp', and so on. By inclusion/exclusion 
(see Theorem 8.1), we have 

R(x,y) = ^ p(d)[x/d\ = ^ p(d)(x / d) - ^ p{d){x/d} . 

d\Q d\Q d\Q 

Moreover, 

^ p(d)(x/d) = x ^ p(d)/d = x J|(l - 1/p), 
d\Q d\Q p<y 

and 

2>(<0 [x/d] <2l = 2" (y) . 

d\Q d\Q 

That proves the theorem. □ 

This theorem says something non-trivial only when y is quite small. Neverthe- 
less, using Chebyshev’s theorem on the density of primes, along with Mertens’ 
theorem, it is not hard to see that this theorem implies that (10.9) holds when 
s = O(0og m) c ) for some constant c (see Exercise 10.8), which implies the esti- 
mate (10.12) above, when i < s < i 2 . 

Exercise 10.8. Suppose that s is a function of m such that s = 0((logm) c ) for 
some positive constant c. Show that a(m, s) = 0(1 / log 5 ). 

Exercise 10.9. Let / be a polynomial with integer coefficients. For real x > 0 
and y > 0, define Rf{x,y) to be the number of positive integers t up to x such 
that / (t) is y- rough. For each positive integer m. define a>f{m) to be the number of 
integers t e {0 , m — 1} such that /(f) = 0 (mod m). Show that 

Rf(x, y) - - (»f{p)/p) < fj(l +cof(p)). 

p<y p<y 
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Exercise 10.10. Consider again the problem of generating a random Sophie 
Germain prime, as discussed in Exercises 10.5 and 10.6. A useful idea is to first 
test if either n or 2n + I arc divisible by any small primes up to some bound s, 
before performing any more expensive tests. Using this idea, design and analyze 
an algorithm that improves the running time of the algorithm in Exercise 10.6 to 
0(£ 5 / lcn(t) 2 + kP) — under the same assumptions, and achieving the same error 
probability bound as in that exercise. Hint: first show that the previous exercise 
implies that the number of positive integers t up to x such that both t and 2t + 1 arc 
y-rough is at most 

n (l-2/p) + 3* w . 

2 <p<y 

Exercise 10.11. Design an algorithm that takes as input a prime q and a bound 
m, and outputs a random prime p between 2 and m such that p = 1 (mod q). 
Clearly, we need to assume that m is sufficiently large with respect to q. Ana- 
lyze your algorithm assuming Conjecture 5.22. State how large m must be with 
respect to q, and under these assumptions, show that your algorithm runs in time 
Off 4 / len(() + kP), and that its output is incorrect with probability Of A~ k £). As 
usual, £ := len(m). 


10.3.3 Generating a random Gbit prime 

In some applications, we want to generate a random prime of fixed size — a ran- 
dom 1024-bit prime, for example. More generally, let us consider the following 
problem: given an integer £ > 2, generate a random Gbit prime, that is, a prime in 
the interval [2 f-1 ,2 f ). 

Bertrand’s postulate (Theorem 5.8) implies that there exists a constant c > 0 
such that nil 1 ) - n(2 e ~ l ) > c2 e ~ x /£ for all £ > 2. 

Now let us modify Algorithm RP so that it takes as input an integer £ > 2, and 
repeatedly generates a random n in the interval {2 ( ~ x , . . . , 2 e — 1 } until IsPrimei n) 
returns true. Let us call this valiant Algorithm RP'. Further, let us implement 
IsPrime { •) as MR(-,k), for some auxiliary parameter k, and define y'(£, k) to be 
the probability that the output of Algorithm RP' — with this implementation of 
IsPrime — is composite. 

Then using exactly the same reasoning as in §10.3.1, we have 


r'(£,k)< 4~ k 


2 e-i 

ni 20 - ni 2 ( ~ l ) 


OiA~ k £)\ 


moreover, if the output of Algorithm RP' is prime, then every (-hit prime is equally 
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likely, and the expected running time is 0(£ 4 + k£ 3 ). By doing some trial division 
as in §10.3.2, this can be reduced to 0{£ 4 / len(d) + k£ 3 ). 

The function y'(£, k ) has been studied a good deal; for example, the following 
explicit bound is known: 

Theorem 10.6. For all £ >2, we have 

/(£, 1) < f 2 4 2_V l 


Proof. Literature — see §10.5. □ 

Upper bounds for y'{£, k) for specific values of £ and k have been computed. 
The following table lists some known lower bounds for - \og 1 iy'i£, k)) for various 
values of £ and k: 


k\£ 

200 

300 

400 

500 

600 

1 

11 

19 

37 

56 

75 

2 

25 

33 

46 

63 

82 

3 

34 

44 

55 

70 

88 

4 

41 

53 

63 

78 

95 

5 

47 

60 

72 

85 

102 


Using exactly the same reasoning as the derivation of (10.7), one sees that 


y\£,k) < 


rUD k+1 

1 — y'(£, 1) 


10.4 Factoring and computing Euler’s phi function 

In this section, we use some of the ideas developed to analyze the Miller-Rabin 
test to prove that the problem of factoring n and the problem of computing (pin) 
arc equivalent. By equivalent, we mean that given an efficient algorithm to solve 
one problem, we can efficiently solve the other, and vice versa. 

Clearly, one direction is easy: if we can factor n into primes, so 

n = p\' ■■■ p7, (10.13) 

then we can simply compute (pin) using the formula 

(Pin) = p\ l ~\p\ - 1) • ■■p7~\p r ~ !)• 

For the other direction, first consider the special case where n = pq, for distinct 
primes p and q. Suppose we arc given n and (pin), so that we have two equations 
in the unknowns p and q: 


n = pq and cp£n) = (p — 1)(<7 - 1). 
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Substituting n/p for q in the second equation, and simplifying, we obtain 

'y 

p + (cp(n) - n - 1 )p + n = 0, 
which can be solved using the quadratic formula. 

For the general case, it is just as easy to prove a stronger result: given any non- 
zero multiple of the exponent of Z*, we can efficiently factor n. In particular, this 
will show that we can efficiently factor Carmichael numbers. 

Before stating the algorithm in its full generality, we can convey the main idea 
by considering the special case where n = pq, where p and q arc distinct primes, 
with p = q = 3 (mod 4). Suppose we are given such an n, along with a non-zero 
multiple / of the exponent of Z*. Now, Z* = Z* x Z*, and since Z* is a cyclic 
group of order p — I and Z* is a cyclic group of order q — 1 , this means that / is 
a non-zero common multiple of p — I and q — 1. Let / = t2 h , where t is odd, and 
consider the following probabilistic algorithm: 

a t Z+ 

d <— gcd(rep(ar), n) 

if d ^ 1 then output d and halt 

/? <- a’ 

d’ <- gcd(rep(/?) + 1, n) 

if d! £ { 1 , n } then output d! and halt 

output “failure” 

Recall that rep (or) denotes the canonical representative of a, that is, the unique 
integer a such that \a\ n = a and 0 < a < n. We shall prove that this algorithm 
outputs a non-trivial divisor of n with probability at least 1 /2. 

Let p be the t-power map on Z*, and let G := p _1 ({±l}). We shall show that 

• G C Z*, and 

• if the algorithm chooses a G. then it splits n. 

Since G is a subgroup of Z*, it follows that |G|/|Z^| < |G|/|Z*| < 1/2, and this 
implies the algorithm succeeds with probability at least 1 / 2. 

Let 0 : Z„ ^ Z ;J x 7L q be the ring isomorphism from the Chinese remainder 
theorem. The assumption that p = 3 (mod 4) means that ( p — l)/2 is an odd 
integer, and since / is a multiple of p — 1, it follows that ged (t,p— 1) = (p — l)/2, 
and hence the image of Z* under the t-power map is the subgroup of Z* of order 2, 
which is {±1}. Likewise, the image of Z* under the r-power map is {±1}. Thus, 

oom p) = 0((z :)') = (d{K)Y = (Kf x (K y = {±1} X {±1}, 

and so Im p consists of the four elements: 

1 = 6 > _ 1 ( 1 , 1 ), -1 = 6 > _ 1 (— 1 , — 1 ), 6 > _ 1 (- 1 , 1 ), 6 > _ 1 ( 1 , — 1 ). 
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By the observations in the previous paragraph, not all elements of Z* map to ±1 
under p, which means that G C Z*. Suppose that the algorithm chooses a \G. 
We want to show that n gets split. If a ^ Z*, then gcd(rep(a), n) is a non-trivial 
divisor of n, and the algorithm splits n. So let us assume that a e Z* \ G. Consider 
the value /? = a 1 = p{a) computed by the algorithm. Since a £ G, we have 
/? ^ ± 1 , and by the observations in the previous paragraph, we have 9 ( /!) = (-1,1) 
or 6(P) = (1,-1). In the first case, #(/? + 1) = (0,2), and so gcd(rep(/?)+ 1, n) = p, 
while in the second case, 9(P + 1) = (2, 0), and so gcd(rep(/?) + 1, n) = q. In either 
case, the algorithm splits n. 

We now consider the general case, where n is an arbitrary positive integer. Let 
A{n) denote the exponent of Z*. If the prime factorization of n is as in (10. 13), then 
by the Chinese remainder theorem, we have 

A(n) = lcm(2 (pf), . . . , A(p e r r )). 

Moreover, for every prime power p e , by Theorem 7.28, we have 

e,_{ P e ~\p- 1) if T 7 ^ 2 or e < 2, 

P ’ \ 2 e ~ 2 if p = 2 and e > 3. 

In pai'ticular, if d \ n, then A(d) \ A{n). 

Now, assume we are given n, along with a non-zero multiple / of A(n). We 
would like to calculate the complete prime factorization of n. We may proceed 
recursively: first, if n = 1, we may obviously halt; otherwise, we test if n is prime, 
using an efficient primality test, and if so, halt (if we are using the Miller-Rabin 
test, then we may erroneously halt even when n is composite, but we can ensure 
that this happens with negligible probability); otherwise, we split n as n = d \ do, 
using an algorithm to be described below, and then recursively factor both d\ and 
dp, since A(d\) \ f and Aid 2 ) \ /, we may use the same value / in the recursion. 

So let us assume that n > 1 and n is not prime, and our goal now is to use / to 
obtain a non-trivial factorization of n. If n is even, then we can certainly do this. 
Moreover, if n is a perfect power — that is, if n = a b for some integers a > 1 and 
b > 1 — we can also obtain a non-trivial factorization of n (see Exercise 3.31). 

So let us assume not only that n > 1 and n is not prime, but also that n is odd, 
and n is not a perfect power. Let / = t2 h , where t is odd. Consider the following 
probabilistic algorithm: 
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a t 1+ 

d <— gcd(rep(ar), n) 

if d 1 then output d and halt 

P <- a‘ 

for / «- 0 to h — I do 

d' <- gcd(rep(/?) + 1, n) 

if d! ^ { 1 , n } then output d! and halt 

P^P 2 

output “failure” 


We want to show that this algorithm outputs a non-trivial factor of n with prob- 
ability at least 1/2. To do this, suppose the prime factorization of n is as in 
(10.13). Then by our assumptions about n, we have r > 2 and each p, is odd. 
Let A(p^') = f,-2\ where t, is odd, for i = 1, . . ., r, and let g := max{/zi, . . . , h r }. 
Note that since A(n) | /, we have 1 < g < h. 

Let p be the (f2 g-1 )-power map on Z*, and let G := p _1 ({±l}). As above, we 
shall show that 

• G C Z*, and 

• if the algorithm chooses a ^ G, then it splits n, 

which will prove that the algorithm splits n with probability at least 1/2. 

Let 


9 : Z„ — > Z e i x ■ ■ ■ x Z jr 

" Py Pr 

be the ring isomorphism of the Chinese remainder theorem. We have 


9 ( Im p) = G\ x ■ ■ ■ x G r , 


where 

Gi := (Z; e ,.) ?2S_1 for / = l, ... ,r. 

Let us assume the pf s are ordered so that /i, = g for i = 1, • • • , r', and h t < g 
for i = r' + 1, . . . , r, where we have 1 < r' < r. Then we have G, = {±1} for 
i = 1 and G, = { 1 } for / = r' + 1, . . . , r. 

By the observations in the previous paragraph, and the fact that r > 2, the image 
of p contains elements other than ±1; for example, 0 -1 (— 1, 1, . . . , 1) is such an 
element. This means that G CZ*. Suppose the algorithm chooses a e K \ G. 
We want to show that n gets split. If a ^ Z*, then gcd(rcp(a), n) is a non-trivial 
divisor of n , and so the algorithm certainly splits n. So assume a e Z* \ G. In loop 
iteration j = g — 1, the value of /? is equal to p(a), and writing 9(f) = (f \, . . . , /?,.), 
we have pi = ±1 for i = 1, • • • , r. Let S be the set of indices i such that /?, = - 1. 
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As a ^ G, we know that /? ±1, and so 0 C S C { 1, . . . , r] . Thus, 

gcd(rep(/?) + 1 ,n) = Y[p? 

ieS 

is a non-trivial factor of n. This means that the algorithm splits n in loop iteration 
j = g — 1 (if not in some earlier loop iteration). 

So we have shown that the above algorithm splits n with probability at least 1/2. 
If we iterate the algorithm until n gets split, the expected number of loop iterations 
required will be at most 2. Combining this with the above recursive algorithm, 
we get an algorithm that completely factors an arbitrary n in expected polynomial 
time. 


Exercise 10. 12. Suppose you are given an integer n of the form n = pq, where 
p and q are distinct, f-bit primes, with p = 2p' + I and q = 2 q' + 1, where p' and 
q' are themselves prime. Suppose that you arc also given an integer t such that 
gcd {t,p'q') ± 1. Show how to efficiently factor n. 

Exercise 10. 13. Suppose there is a probabilistic algorithm A that takes as input 
an integer n of the form n = pq, where p and q arc distinct, f-bit primes, with 
p = 2p' + I and q = 2q' + 1, where p' and q' arc prime. The algorithm also takes 
as input a, /? e (Z*) 2 . It outputs either “failure,” or integers x,y, not both zero, 
such that a x fi y = 1. Furthermore, assume that A runs in expected polynomial 
time, and that for all n of the above form, and for randomly chosen a, fi e (Z*) 2 , 
A succeeds in finding x,y as above with probability a(n). Here, the probability 
is taken over the random choice of a and /l, as well as the random choices made 
during the execution of A on input («, a, /?). Show how to use A to construct 
another probabilistic algorithm A' that takes as input n as above, runs in expected 
polynomial time, and that satisfies the following property: 

if e(n) > 0.001, then A’ factors n with probability at least 0.999. 


10.5 Notes 

The Miller-Rabin test is due to Miller [67] and Rabin [79]. The paper by Miller 
defined the set L' n , but did not give a probabilistic analysis. Rather, Miller showed 
that under a generalization of the Riemann hypothesis, for composite n, the least 
positive integer a such that [n]„ e Z„ \ L' n is at most 0((log n) 2 ), thus giving rise 
to a deterministic primality test whose correctness depends on the above unproved 
hypothesis. The later paper by Rabin re-interprets Miller’s result in the context of 
probabilistic algorithms. 
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Bach [10] gives an explicit version of Miller’s result, showing that under the 
same assumptions, the least positive integer a such that \a\ n e Z„ \ L' n is at most 
2(log n) 2 - more generally, Bach shows that the following holds under a generaliza- 
tion of the Riemann hypothesis: 

For every positive integer n, and every subgroup G CZ*, the least 
positive integer a such that \a\ n eZ„\6 is at most 2(log n) 2 , and the 
least positive integer b such that [ b] n eZ*\6 is at most 3 (log n) 2 . 

The first efficient probabilistic primality test was invented by Solovay and Strassen 
[99] (their paper was actually submitted for publication in 1974). Later, in Chap- 
ter 21, we shall discuss a recently discovered, deterministic, polynomial-time 
(though not very practical) primality test, whose analysis does not rely on any 
unproved hypothesis. 

Carmichael numbers are named after R. D. Carmichael, who was the first to 
discuss them, in work published in the early 20th century. Alford, Granville, and 
Pomerance [7] proved that there arc infinitely many Carmichael numbers. 

Exercise 10.4 is based on Lehmann [58]. 

Theorem 10.4, as well as the table of values just below it, arc from Kim and 
Pomerance [55]. In fact, these bounds hold for the weaker test based on L n . 

Our analysis in §10.3.2 is loosely based on a si mi lar analysis in §4.1 of Maurer 
[65]. Theorem 10.5 and its generalization in Exercise 10.9 arc certainly not the best 
results possible in this area. The general goal of “sieve theory” is to prove useful 
upper and lower bounds for quantities like R/{x, y) that hold when y is as large as 
possible with respect to x. For example, using a technique known as Brun’s pure 
sieve, one can show that for log y < y/log x, there exist ft and /?', both of absolute 
value at most 1 , such that 

Rf(x,y) = (l + /? e -V^)xna -a>f(p)/p) + P'Vx. 

p<y 

Thus, this gives us very sharp estimates for R/(x, y) when x tends to infinity, and 
y is bounded by any fixed polynomial in log x. For a proof of this result, see §2.2 
of Halberstam and Richert [44] (the result itself is stated as equation 2.16). Brun’s 
pure sieve is really just the first non-trivial sieve result, developed in the early 20th 
century; even stronger results, extending the useful range of y (but with larger error 
terms), have subsequently been proved. 

Theorem 10.6, as well as the table of values immediately below it, are from 
Damgard, Landrock, and Pomerance [32]. 

The algorithm presented in § 10.4 for factoring an integer given a multiple of 
c p(n ) (or, for that matter, Mn)) is essentially due to Miller [67]. However, just as for 
his primality test. Miller presents his algorithm as a deterministic algorithm, which 
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he analyzes under a generalization of the Riemann hypothesis. The probabilistic 
version of Miller’s factoring algorithm appeal's to be “folklore.” 



11 

Finding generators and discrete logarithms in z; 


As we have seen in Theorem 7.28, for a prime p, Z* is a cyclic group of order 
p — 1. This means that there exists a generator y e Z*, such that each a € Z* can 
be written uniquely as a = y x , where x is an integer with 0 < x < p— 1; the integer 
x is called the discrete logarithm of a to the base y, and is denoted log / a. 

This chapter discusses some computational problems in this setting; namely, 
how to efficiently find a generator y, and given y and a, how to compute log y a. 

More generally, if y generates a subgroup G of Z* of order q, where q \ (p — 1), 
and a e G, then log / a is defined to be the unique integer x with 0 < x < q and 
a = y x . In some situations it is more convenient to view log y a as an element of 
Z q . Also for x g Z q , with x = \a\ q , one may write y x to denote y a . There can be 
no confusion, since if x = [ a' \ q , then y a = y a . However, in this chapter, we shall 
view log 7 a as an integer. 

Although we work in the group Z*, all of the algorithms discussed in this chapter 
trivially generalize to any finite cyclic group that has a suitably compact repre- 
sentation of group elements and an efficient algorithm for performing the group 
operation on these representations. 


11.1 Finding a generator for Z* 

In this section, we consider the problem of how to find a generator for Z*. There 
is no efficient algorithm known for this problem, unless the prime factorization of 
p — 1 is given, and even then, we must resort to the use of a probabilistic algorithm. 
Of course, factoring in general is believed to be a very difficult problem, so it 
may not be easy to get the prime factorization of p — 1 . However, if our goal is 
to construct a large prime p, together with a generator for Z*, then we may use 
Algorithm RFN in §9.6 to generate a random factored number n in some range, 
test n + 1 for primality, and then repeat until we get a factored number n such that 
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p = n + 1 is prime. In this way, we can generate a random prime p in a given range 
along with the factorization of p — 1 . 

We now present an efficient probabilistic algorithm that takes as input an odd 
prime p, along with the prime factorization 




7, 


(=t 

and outputs a generator for Z*. It runs as follows: 

for i <r- 1 to r do 
repeat 

choose a e Z* at random 
compute p <r- a (p ~ l ^ q ‘ 
until p ■£ 1 

Yt *- atP-V/tf 

y - n:=i * 

output y 

First, let us analyze the coiTectness of this algorithm. When the z'th loop iteration 
terminates, by construction, we have 

rf = 1 but yf ‘ jt 1 . 

It follows (see Theorem 6.37) that /, has multiplicative order qd . From this, it 
follows (see Theorem 6.38) that y has multiplicative order p — 1. Thus, we have 
shown that if the algorithm terminates, its output is always correct. 

Let us now analyze the running time of this algorithm. Fix i = 1 , ,r, and 
consider the repeat/until loop in the z'th iteration of the outer loop. Let be the 
random variable whose value is the number of iterations of this repeat/until loop. 
Since a is chosen at random from Z*, the value of p is uniformly distributed over 
the image of the ( p - 1 ) / < 7 , -power map (see Theorem 8.5), and since the latter is a 
subgroup of Z* of order z/, (see Example 7.61), we see that P = I with probability 
l/< 7 ,. Thus, L, has a geometric distribution with associated success probability 
1 - 1 / q u and E[L,] = 1/(1 - 1 /qp) < 2 (see Theorem 9.3). 

Now set L := L 1 + • • • +L r . By lineality of expectation (Theorem 8.14), we have 
E[L] = E[Z_ 1 ] + • • • + E [L r ] < 2 r. The running time Z of the entire algorithm is 
0(L ■ len(p) 3 ), and hence the expected running time is E[Z] = 0(r len(p) 3 ), and 
since r < log 9 p, we have E[Z] = 0(len(p) 4 ). 

Although this algorithm is quite practical, there are asymptotically faster algo- 
rithms for this problem (see Exercise 11.2). 
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Exercise 11.1. Suppose we arc not given the prime factorization of p — 1, but 
rather, just a prime q dividing p— 1 , and we want to find an element of multiplicative 
order q in Z*. Design and analyze an efficient algorithm to do this. 

Exercise 11.2. Suppose we are given a prime p, along with the prime factoriza- 
tion P - 1 = nr =1 C 

(a) If, in addition, we are given a e Z*, show how to compute the multiplica- 
tive order of a in time 0(r len(p) 3 ). Hint: use Exercise 6.40. 

(b) Improve the running time bound to 0(len(r) len(p) 3 ). Hint: use Exer- 
cise 3.39. 

(c) Modifying the algorithm you developed for part (b), show how to construct 
a generator for Z* in expected time 0(len(r) len(p) 3 ). 

Exercise 11.3. Suppose we arc given a positive integer n, along with its prime 
factorization n = p e { [ ■ ■ ■ p e r r , and that for each i = 1, . . . , r, we are also given the 
prime factorization of p, — 1 . Show how to efficiently compute the multiplicative 
order of any element a e Z*. 

Exercise 11.4. Suppose there is an efficient algorithm that takes as input a pos- 
itive integer n and an element a e Z*, and computes the multiplicative order of a. 
Show how to use this algorithm to build an efficient integer factoring algorithm. 


11.2 Computing discrete logarithms in Z* 

In this section, we consider algorithms for computing the discrete logarithm of 
a e Z* to a given base y. The algorithms we present here are, in the worst case, 
exponential-time algorithms, and are by no means the best possible; however, in 
some special cases, these algorithms are not so bad. 


11.2.1 Brute-force search 

Suppose that y e Z* generates a subgroup G of Z* of order q > 1 (not necessarily 
prime), and we arc given p, q, y, and a e G, and wish to compute log r a. 

The simplest algorithm to solve this problem is brute-force search: 

1 

i 0 

while P f a do 

P <- P-Y 
/ < — / + 1 
output i 
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This algorithm is clearly correct, and the main loop will always halt after at 
most q iterations (assuming, as we arc, that a e G). So the total running time is 
0 (< 7 len(p) 2 ). 


11.2.2 Baby step/giant step method 

As above, suppose that y e Z* generates a subgroup G of Z* of order q > I (not 
necessarily prime), and we are given p, q , y, and a e G, and wish to compute 
log r a. 

A faster algorithm than brute-force search is the baby step/giant step method. 
It works as follows. 

Let us choose an approximation m to q 1 ^ 2 . It does not have to be a very good 
approximation — we just need m = Q(q 1 ^ 2 ). Also, let m' = [q/mj, so that 
m' = ©(r/ 1 / 2 ) as well. 

The idea is to compute all the values y 1 for / = 0 m — I (the “baby steps”) 

and to build an “associative array” (or “lookup table”) T that maps the key y 1 
to the value i. For P e Z*, we shall write T\p\ to denote the value associated 
with the key /?, writing T\p] = _L if there is no such value. We shall assume 
that T is implemented so that accessing T\jl\ is fast. Using an appropriate data 
structure, T can be implemented so that accessing individual elements takes time 
0(len(p)). One such data structure is a radix tree (also called a search trie). Other 
data structures may be used (for example, a hash table or a binary search tree), but 
these may have somewhat different access times. 

We can build the associative array T using the following algorithm: 

initialize T // T[p] = ±for all P e Z* 

/><- I 

for i <r- 0 to m — 1 do 

T[p] <- i 

P <- P-r 

Clearly, this algorithm takes time ()(q ] ^ 2 lent p) 2 ). 

After building the lookup table, we execute the following procedure (the “giant 
steps”): 

y> y~ m 

P <- a, j <- 0, i <r- T[P] 

while i = ± do 

P^P-y', j^j + h i *- T[p] 

x <- jm + i 

output X 
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To analyze this procedure, suppose that a = y x with 0 < x < q. Now, x can be 
written in a unique way as x = vm + u, where u and v arc integers with 0 < u < m 
and 0 < v < in' . In the /tli loop iteration, for j = 0, 1 , . . . , we have 

j] = ay~ mj = yC-n> n + u _ 

So we will detect i ^ J. precisely when j = v, in which case i = u. Thus, the 
output will be correct, and the total running time of the algorithm (for both the 
“baby steps” and “giant steps” parts) is easily seen to be 0(q 1//2 len(p) 2 ). 

While this algorithm is much faster than brute-force search, it has the draw- 
back that it requires space for about q^ 2 elements of Z p . Of course, there is a 
“time/space trade-off” here: by choosing m smaller, we get a table of size 0{m), 
but the running time will be proportional to 0{q/m). In §1 1.2.5 below, we discuss 
an algorithm that runs (at least heuristically) in time (Xq^ 1 lcn(r/j len(p) 2 ), but 
which requires space for only a constant number of elements of Z p . 


11.2.3 Groups of order q e 

Suppose that ye Z* generates a subgroup G of Z* of order q e , where q > 1 and 
e > 1, and we are given p, q, e, y, and a e G, and wish to compute log r a. 

There is a simple algorithm that allows one to reduce this problem to the problem 
of computing discrete logarithms in the subgroup of Z* of order q. 

It is perhaps easiest to describe the algorithm recursively. The base case is when 
e = 1, in which case, we use an algorithm for the subgroup of Z* of order q. For 
this, we might employ the algorithm in § 1 1.2.2, or if q is very small, the algorithm 
in §11.2.1. 

Suppose now that e > 1. We choose an integer / with 0 < / < e. Different 
strategies for choosing / yield different algorithms — we discuss this below. Sup- 
pose a = y x , where 0 < x < q e . Then we can write x = q^v + u, where u and v are 
integers with 0 < u < qf and 0 < v < q e ~? . Therefore, 

cT f = y*- fu . 

^ — f /• 

Note that y q has multiplicative order q ' , and so if we recursively compute the 

q — f e — f 

discrete logarithm of a q to the base y q , we obtain u. 

Having obtained u, observe that 

a /yU = y Xv_ 

Note also that y q has multiplicative order q e J , and so if we recursively compute 

f 

the discrete logarithm of a/ y u to the base y q , we obtain v, from which we then 
compute x = q^v + u. 
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Let us put together the above ideas succinctly in a recursive procedure: 
Algorithm RDL. On input p, q, e, y, a as above, do the following: 
if e = 1 then 

return log r a // base case: use a different algorithm 

else 

select f e { 1, . . . , e — 1} 
u 4- RDL(p, q, /, y qL f , a q ‘ 1 ) // 0 < u < qf 
v <r- RDL(p, q, e - /, y qJ , a/y u ) // 0 < v < q e ~ f 
return q^v + u 

To analyze the running time of this recursive algorithm, note that the running 
time of the body of one recursive invocation (not counting the running time of the 
recursive calls it makes) is 0(elen(<7) lent p) 2 ). To calculate the total running time, 
we have to sum up the running times of all the recursive calls plus the running 
times of all the base cases. 

Regardless of the strategy for choosing /, the total number of base case invoca- 

e — 1 

tions is e. Note that all the base cases compute discrete logarithms to the base y q . 
Assuming we implement the base case using the baby step/giant step algorithm in 
§ 1 1.2.2, the total running time for all the base cases is therefore 0(eq 1 ^ 2 lent p) 2 ). 

The total running time for the recursion (not including the base case computa- 
tions) depends on the strategy used to choose the split /. It is helpful to represent 
the behavior of the algorithm using a recursion tree. This is a binary tree, where 
every node represents one recursive invocation of the algorithm; the root of the 
tree represents the initial invocation of the algorithm; for every node N in the tree, 
if N represents the recursive invocation /, then TV’s children (if any) represent 
the recursive invocations made by I. We can naturally organize the nodes of the 
recursion tree by levels: the root of the recursion tree is at level 0, its children are 
at level 1, its grandchildren at level 2, and so on. The depth of the recursion tree is 
defined to be the maximum level of any node. 

We consider two different strategies for choosing the split / : 

• If we always choose / = 1 or / = e — 1, then the depth of the recursion 
tree is 0(e). The running time contributed by each level of the recursion 
tree is 0(e lcn(t/j len(p) 2 ), and so the total running time for the recursion is 
0(e 2 len(< 7 ) len(p) 2 ). Note that if / = 1 , then the algorithm is essentially 
tail recursive, and so may be easily converted to an iterative algorithm with- 
out the need for a stack. 

• If we use a “balanced” divide-and-conquer strategy, choosing / & e/2, 
then the depth of the recursion tree is 0(len(e)), while the running time 
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contributed by each level of the recursion tree is still ()(e len(g) lent p) 2 ). 
Thus, the total running time for the recursion is 0(e len(e) lenfg) lent p) 2 ). 
Assuming we use the faster, balanced recursion strategy, and that we use the 
baby step/giant step algorithm for the base case, the total running time of Algo- 
rithm RDL is: 

0((eq 1 ^ 2 + e len(e) len(g)) • len(p) 2 ). 


11.2.4 Discrete logarithms in Z* 

Suppose that we arc given a prime p, along with the prime factorization 

p - 1 = n^r* 

(=i 

a generator y for Z*, and a e Z*. We wish to compute log r a. 

Suppose that a = y x , where 0 < x < p — 1. Then for / = 1, . . . , r, we have 

a (p- d/ 9, = ( r (p-D/«, e, y. 

Note that y ( i’~ l) Ch' has multiplicative order c {‘ , and if x,- is the discrete logarithm 
of ab'-D/'?,' t 0 the base j/T- 1 )/</,' ? then we have 0 < x, < qV and x = x,- (mod q L l). 

Thus, if we compute the values x\,...,x r , using Algorithm RDL in §11.2.3, 
we can obtain x using the algorithm of the Chinese remainder theorem (see Theo- 
rem 4.6). If we define q : = max {q\,...,q r }, then the running time of this algorithm 
will be bounded by q */ 2 len(p)° (1 \ 

We conclude that 

the difficulty of computing discrete logarithms in Z* is determined 
by the size of the largest prime dividing p — 1. 


11.2.5 A space-efficient square-root time algorithm 
We present a more space-efficient alternative to the algorithm in §1 1.2.2, the anal- 
ysis of which we leave as a series of exercises for the reader. 

The algorithm makes a somewhat heuristic assumption that we have a function 
that “behaves” for all practical purposes like a random function. Such functions can 
indeed be constructed using cryptographic techniques under reasonable intractabil- 
ity assumptions; however, for the particular application here, one can get by in 
practice with much simpler constructions. 

Let p be a prime, q a prime dividing p — 1, y an element of Z* that generates a 
subgroup G of Z* of order q, and a e G. Let F be a function mapping elements of 
G to {0 q — 1 } . Define H : G -» G to be the function that sends /? to flay F ^\ 
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The algorithm runs as follows: 

i <r- 1 

x 0, j) *- a, 

x' <- F(P), P' <- H(p) 

while p J- P' do 

x <- (x + F(p)) mod q,p<r- H(p ) 
repeat 2 times 

x' 4- (x' + F(p')) mod q , /?' <- H(P') 
i <- i + 1 
if i < q then 

output (x - x')/ -1 mod q 

else 

output “fail” 

To analyze this algorithm, let us define Pi, P 2 , . . ■ , as follows: P\ := a and for 

i > 1, Pi := H(p,. 1 ). 

Exercise 1 1.5. Show that each time the main loop of the algorithm is entered, 
we have /? = /?, = y x a‘, and P’ = /? 2 , = y x ' a 2 ’. 

Exercise 1 1.6. Show that if the loop terminates with i < q, the value output is 
equal to log r a. 

Exercise 11.7. Let j be the smallest index such that pj = Pk for some index 
k < j. Show that j < q + 1 and that the loop terminates with i < j (and in 
particular, i < q). 

Exercise 11.8. Assume that F is a random function, meaning that it is chosen at 

random, uniformly from among all functions from G into {0 q— 1 }. Show that 

this implies that FI is a random function, meaning that it is uniformly distributed 
over all functions from G into G. 

Exercise 11.9. Assuming that F is a random function as in the previous exercise, 
apply the result of Exercise 8.45 to conclude that the expected running time of the 
algorithm is 0(q */ 2 1 e n ( ) len(p) 2 ), and that the probability that the algorithm fails 
is exponentially small in q. 


11.3 The Diffie-Hellman key establishment protocol 

One of the main motivations for studying algorithms for computing discrete loga- 
rithms is the relation between this problem and the problem of breaking a protocol 
called the Diffie-Hellman key establishment protocol, named after its inventors. 
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In this protocol, Alice and Bob need never to have talked to each other before, but 
nevertheless, can establish a shared secret key that nobody else can easily compute. 
To use this protocol, a third party must provide a “telephone book,” which contains 
the following information: 

• p, q, and y, where p and q arc primes with q \ (p — 1), and y is an element 
generating a subgroup G of Z* of order q; 

• an entry for each user, such as Alice or Bob, that contains the user’s name, 
along with a “public key” for that user, which is an element of the group G. 

To use this system, Alice posts her public key in the telephone book, which is 
of the form a = y x , where x e {0, . . . , q — 1 } is chosen by Alice at random. The 
value x is Alice’s “secret key,” which Alice never divulges to anybody. Likewise, 
Bob posts his public key, which is of the form p = y y , where ye {0, . . . , q — 1 } is 
chosen by Bob at random, and is his secret key. 

To establish a shared key known only between them, Alice retrieves Bob’s public 
key P from the telephone book, and computes ka '■= P x ■ Likewise, Bob retrieves 
Alice’s public key a, and computes kb '■= oc y . It is easy to see that 

k-a = P x = (r y y = r xy = ( r x Y = <* y = k b , 

and hence Alice and Bob share the same secret key k := k a = tc B . 

Using this shared secret key, they can then use standard methods for encryption 
and message authentication to hold a secure conversation. We shall not go any 
further into how this is done; rather, we briefly (and only superficially) discuss 
some aspects of the security of the key establishment protocol itself. Clearly, if 
an attacker obtains a and p from the telephone book, and computes x = log ;/ a, 
then he can compute Alice and Bob’s shared key as k = p x — in fact, given x, an 
attacker can efficiently compute any key shared between Alice and another user. 

Thus, if this system is to be secure, it should be very difficult to compute discrete 
logarithms. However, the assumption that computing discrete logarithms is hard is 
not enough to guarantee security. Indeed, it is not entirely inconceivable that the 
discrete logarithm problem is hard, and yet the problem of computing k from a 
and P is easy. The latter problem — computing k from a and P — is called the 
Diffie-Hellman problem. 

As in the discussion of the RSA cryptosystem in §4.7, the reader is warned that 
the above discussion about security is a bit of an oversimplification. A complete 
discussion of all the security issues related to the above protocol is beyond the 
scope of this text. 

Note that in our presentation of the Diffie-Hellman protocol, we work with a 
generator of a subgroup G of Z* of prime order, rather than a generator for Z*. 
There are several reasons for doing this: one is that there are no known discrete 
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logarithm algorithms that arc any more practical in G than in Z*, provided the order 
q of G is sufficiently large; another is that by working in G, the protocol becomes 
substantially more efficient. In typical implementations, p is 1024 bits long, so as 
to protect against subexponential-time algorithms such as those discussed later in 
§15.2, while q is 160 bits long, which is enough to protect against the squai'e -root- 
time algorithms discussed in §11.2.2 and §11.2.5. The modular exponentiations 
in the protocol will run several times faster using “short,” 160-bit exponents rather 
than “long,” 1024-bit exponents. 

For the following exercise, we need the following notions from complexity the- 
ory. 

• We say problem A is deterministic poly-time reducible to problem B if 
there exists a deterministic algorithm R for solving problem A that makes 
calls to a subroutine for problem B , where the running time of R (not 
including the running time for the subroutine for B) is polynomial in the 
input length. 

• We say that problems A and B are deterministic poly-time equivalent if A 
is deterministic poly-time reducible to B and B is deterministic poly-time 
reducible to A. 

Exercise 11.10. Consider the following problems. 

(a) Given a prime p, a prime q that divides p — 1 , an element ye Z* generat- 
ing a subgroup G of Z* of order q, and two elements a, ji e G, compute 
Y xy , where x := log r a and y := logj, /?. (This is just the Diffie-Hellman 
problem.) 

(b) Given a prime p, a prime q that divides p— 1, an element y e Z* generating 

2 

a subgroup G of Z* of order q, and an element a e G, compute y x , where 
x := logj, a. 

(c) Given a prime p, a prime q that divides p— 1, an element y e Z* generating 
a subgroup G of Z* of order q, and two elements a, ft e G, with ^ I , 
compute y xy , where x := logj, a, y' := y~ l mod q, and y := log y p. 

(d) Given a prime p, a prime q that divides p— 1, an element y e Z* generating 
a subgroup G of Z* of order q, and an element a e G, with a ^ 1 , compute 
y x , where x' := x _1 mod q and x := log r a. 

Show that these problems arc deterministic poly-time equivalent. Moreover, your 
reductions should preserve the values of p, q, and y: that is, if the algorithm that 
reduces one problem to another takes as input an instance of the former problem 
of the form (p, q,y ,. . .), it should invoke the subroutine for the latter problem with 
inputs of the form ( p , q,y ,. . .). 
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Exercise 11.11. Suppose there is a probabilistic algorithm A that takes as input a 
prime p, a prime q that divides p — 1 , and an element y e Z* generating a subgroup 
G of Z* of order q. The algorithm also takes as input a e G. It outputs either 
“failure,” or log r a. Furthermore, assume that A runs in expected polynomial time, 
and that for all p, q, and y of the above form, and for randomly chosen a e G, A 
succeeds in computing log r a with probability e(p, q, y). Here, the probability is 
taken over the random choice of a, as well as the random choices made during the 
execution of A. Show how to use A to construct another probabilistic algorithm 
A' that takes as input p, q, and y as above, as well as a e G, runs in expected 
polynomial time, and that satisfies the following property: 

if e(p, q, y) > 0.001, then for all a e G, A' computes log r a with 
probability at least 0.999. 

The algorithm A' in the previous exercise is an example of a random self- 
reduction, which means an algorithm that reduces the task of solving an arbitrary 
instance of a given problem to that of solving a random instance of the same prob- 
lem. Intuitively, the existence of such a reduction means that the problem is no 
harder in the worst case than on average. 

Exercise 11.12. Let p be a prime, q a prime that divides p — 1, y e Z* an 
element that generates a subgroup G of Z* of order q, and a e G. For 5 e G, 
a representation of 5 with respect to y and a is a pair of integers (r, s), with 
0 < r < q and 0 < s < q, such that y r a s = 5. 

(a) Show that for every 5 e G, there arc precisely q representations (r, s) of 5 
with respect to y and a, and among these, there is precisely one with 5 = 0. 

(b) Show that given a representation (r, s) of 1 with respect to y and a such that 
s ^ 0, we can efficiently compute log r a. 

(c) Show that given any 5 e G, along with any two distinct representations of 
5 with respect to y and a, we can efficiently compute log r a. 

(d) Suppose we arc given access to an “oracle” that, when presented with any 
5 e G, tells us some representation of <5 with respect to y and a. Show how 
to use this oracle to efficiently compute log r a. 

The following two exercises examine the danger of the use of “short” exponents 
in discrete logarithm based cryptographic schemes that do not work with a group 
of prime order. 

Exercise 11.13. Let p be a prime and let p — 1 = q^ 1 ■ ■ ■ q e / be the prime fac- 
torization of p — 1 . Let y be a generator for Z*. Let y be a positive number, and 
let Q p {y ) be the product of all the prime powers q/ with qi < y. Suppose you arc 
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given p , y, the primes <7, dividing p — 1 with q, < y, along with y, an element a of 
Z*, and a bound x, where x := log r a < x. Show how to compute x in time 

(. y l/ 2 + (x/Q P (y)) l/2 ) • len(p)° (1) . 

Exercise 11.14. Continuing with the previous, let Q' p {y) denote the product of 
all the primes q, dividing p — I with c/ ( < y. Note that Q' p {y) \ Q p (y ). The goal of 
this exercise is to estimate the expected value of log Q' p {y), assuming p is a large, 
random prime. To this end, let R be a random variable that is uniformly distributed 
over all f-bit primes, and assume that y < Assuming Conjecture 5.22, show 
that asymptotically (as £ -» 00), we have E[log Q' R {y)] = log y + 0(1). 

The results of the previous two exercises caution against the use of “short” expo- 
nents in cryptographic schemes based on the discrete logarithm problem for Z* 
For example, suppose that p is a random 1024-bit prime, and that for reasons 
of efficiency, one chooses x & 2 , thinking that a method such as the baby 

step/giant step method would require « 2 80 steps to recover x. However, if we 
choose y « 2 80 , then the above analysis implies that Q p {y) is at least « 2 80 with 
a reasonable probability, in which case x/Q p {y) is at most « 2 80 , and so we can 
in fact recover x in « 2 40 steps (there are known methods to find the primes up to 
y that divide p — 1 quickly enough). While 2 80 may not be a feasible number of 
steps, 2 40 may very well be. Of course, none of these issues arise if one works in a 
subgroup of Z* of large prime order, which is the recommended practice. 

An interesting fact about the Diffie-Hellman problem is that there is no known 
efficient algorithm to recognize a solution to the problem. Some cryptographic 
protocols actually rely on the apparent difficulty of this decision problem, which 
is called the decisional Diffie-Hellman problem. The following three exercises 
develop a random self-reducibility property for this decision problem. 

Exercise 11.15. Let p be a prime, q a prime dividing p — 1 , and y an element of 
Z* that generates a subgroup G of order q. Let a e G, and let H be the subgroup 
ofGxG generated by (y, a). Let y, a be arbitrary elements of G, and define the 
map 

p : Z 9 x Z 9 ->GxG 

([r] q , [s] 9 ) >-► ( y r f,a r a s ). 

Show that the definition of p is unambiguous, that p is a group homomorphism, 
and that 

• if (y, a ) e H, then Im p = H, and 

• if (y, a) £ H , then Im p = G x G. 
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Exercise 11.16. For p, q , y as in the previous exercise, let T) pqyr be the set of all 
triples of the form ( y x , y y , y xy ), and let R pqy be the set of all triples of the form 
(y x , y y , y z ). Using the result from the previous exercise, design a probabilistic 
algorithm that runs in expected polynomial time, and that on input p, q, y, along 
with a triple F e R p _ qy . outputs a triple P e Rp.q.y such that 

• if E e Dp.q.y, then T* is uniformly distributed over D pqy , and 

• if E ^ Dp.q.y, then T* is uniformly distributed over Rp.q.y. 

Exercise 11.17. Suppose that A is a probabilistic algorithm that takes as input 
p, q, y as in the previous exercise, along with a triple T* e R- P .q. r , and outputs either 
0 or 1. Furthermore, assume that A runs in expected polynomial time. Define two 
random variables, X PAiY and Y pqy , as follows: 

• X p . q . Y is defined to be the output of A on input p, q , y, and T* , where T* is 
uniformly distributed over D pqy . and 

• Yp.q.y is defined to be the output of A on input p , q, y, and T*, where T* is 
uniformly distributed over Rp.q.y 

In both cases, the value of the random variable is determined by the random choice 
of r*, as well as the random choices made by the algorithm. Define 

e(p, q , y) := P[X M>) , = 1] - P [Y P .q, r = 1] . 

Using the result of the previous exercise, show how to use A to design a probabilis- 
tic, expected polynomial-time algorithm that takes as input p , q, y as above, along 
with r e lZ p q Y , and outputs either “yes” or “no,” so that 

if e{p, q, y) > 0.001, then for all T e lZ p qP y, the probability that A! 
correctly determines whether F e D p q y is at least 0.999. 

Hint: use the Chernoff bound. 

The following exercise demonstrates that the problem of distinguishing “Diffie- 
Hellman triples” from “random triples” is hard only if the order of the underlying 
group is not divisible by any small primes, which is another reason we have chosen 
to work with groups of large prime order. 

Exercise 11.18. Assume the notation of the previous exercise, but let us drop 
the restriction that q is prime. Design and analyze a deterministic algorithm A 
that takes inputs p, q, y and T* e Rp.q.y- that outputs 0 or 1, and that satisfies 
the following property: if t is the smallest prime dividing q, then A runs in time 
(f-t-len(p))° (1 \ and the “distinguishing advantage” e(p, q, y) for A on inputs p, q, y 
is at least l/t. 
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11.4 Notes 

The probabilistic algorithm in §11.1 for finding a generator for Z* can be made 
deterministic under a generalization of the Riemann hypothesis. Indeed, as dis- 
cussed in §10.5, under such a hypothesis, Bach’s result [10] implies that for each 
prime q \ (p — 1), the least positive integer a such that [ a] p e Z* \ (Z*) 4 is at most 
2 log p. 

Related to the problem of constructing a generator for Z* is the question of how 
big is the smallest positive integer g such that [g] p is a generator for Z*; that is, 
how big is the smallest (positive) primitive root modulo p. The best bounds on the 
least primitive root are also obtained using the same generalization of the Riemann 
hypothesis mentioned above. Under this hypothesis, Wang [104] showed that the 
least primitive root modulo p is 0(r 6 lent p) 2 ), where r is the number of distinct 
prime divisors of p— 1 . Shoup [95] improved Wang’s bound to ()(r 4 len(r) 4 lent p) 2 ) 
by adapting a result of Iwaniec [50, 51] and applying it to Wang’s proof. The 
best unconditional bound on the smallest primitive root modulo p is p l / 4 +°(0 (this 
bound is also in Wang [104]). Of course, even if there exists a small primitive root, 
there is no known way to efficiently recognize a primitive root modulo p without 
knowing the prime factorization of p — 1 . 

As we already mentioned, all of the algorithms presented in this chapter arc 
completely “generic,” in the sense that they work in any finite cyclic group — we 
really did not exploit any properties of Z* other than the fact that it is a cyclic 
group. In fact, as far as such “generic” algorithms go, the algorithms presented 
here for discrete logarithms arc optimal [71, 98]. However, there arc faster, “non- 
generic” algorithms (though still not polynomial time) for discrete logarithms in 
Z*. We shall examine one such algorithm later, in Chapter 15. 

The “baby step/giant step” algorithm in §11.2.2 is due to Shanks [91]. See, for 
example, the book by Cormen, Leiserson, Rivest, and Stein [29] for appropriate 
data structures to implement the lookup table used in that algorithm. In particular, 
see Problem 12-2 in [29] for a brief introduction to radix trees, which is the data 
structure that yields the best running time (at least in principle) for our application. 

The algorithms in § 1 1.2.3 and § 1 1.2.4 arc valiants of an algorithm published by 
Pohlig and Heilman [75]. See Chapter 4 of [29] for details on how one analyzes 
recursive algorithms, such as the one presented in §11.2.3; in particular, Section 
4.2 in [29] discusses in detail the notion of a recursion tree. 

The algorithm in §11.2.5 is a variant of an algorithm of Pollard [76]; in fact, 
Pollard’s algorithm is a bit more efficient than the one presented here, but the 
analysis of its running time depends on stronger heuristics. Pollard’s paper also 
describes an algorithm for computing discrete logarithms that lie in a restricted 
interval — if the interval has width w, this algorithm runs (heuristically) in time 
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vv'/2 |en( p) 0tl> , and requires space for 0(len(w)) elements of 7L p . This algorithm 
is useful in reducing the space requirement for the algorithm of Exercise 11.13. 

The key establishment protocol in §11.3 is from Diffie and Heilman [34]. That 
paper initiated the study of public key cryptography, which has proved to be a 
very rich field of research. Exercises 11.13 and 11.14 are based on van Oorschot 
and Wiener [74]. For more on the decisional Diffie-Hellman assumption, see 
Boneh [18]. 
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Quadratic reciprocity and computing modular square 

roots 


In §2.8, we initiated an investigation of quadratic residues. This chapter continues 
this investigation. Recall that an integer a is called a quadratic residue modulo a 
positive integer n if gcd( a, n) = I and a = b 2 (mod n) for some integer b. 

First, we derive the famous law of quadratic reciprocity. This law, while his- 
torically important for reasons of pure mathematical interest, also has important 
computational applications, including a fast algorithm for testing if an integer is a 
quadratic residue modulo a prime. 

Second, we investigate the problem of computing modular square roots: given a 
quadratic residue a modulo n, compute an integer b such that a = b 2 (mod n). As 
we will see, there are efficient probabilistic algorithms for this problem when n is 
prime, and more generally, when the factorization of n into primes is known. 


12.1 The Legendre symbol 

For an odd prime p and an integer a with gcd(n, p) = 1 , the Legendre symbol 
( a | p) is defined to be 1 if a is a quadratic residue modulo p, and -1 otherwise. For 
completeness, one defines (a \ p) = 0 if p \ a. The following theorem su mm arizes 
the essential properties of the Legendre symbol. 

Theorem 12.1. Let p be an odd prime, and let a, b e Z. Then we have: 

(i) (a | p) = a ip ~ ] ) / 1 (mod p); in particular, (-1 | p) = (- 1 jT-f)/ 2 - 

(ii) (a | p){b | p) = lab \ p); 

(iii) a = b (mod p) implies (a \ p) = (b \ p); 

(iv) (2 I p) = (-l^-D/S; 

p— 1 q— 1 

(v) if q is an odd prime, then (p \ q) = (-1) 2 2 (^ | p). 

Part (i) of the theorem is just a restatement of Euler’s criterion (Theorem 2.21). 


342 



12.1 The Legendre symbol 


343 


As was observed in Theorem 2.31, this implies that -1 is a quadratic residue mod- 
ulo p if and only if p = 1 (mod 4). Thus, the quadratic residuosity of -1 modulo p 
is determined by the residue class of p modulo 4. 

Part (ii) of the theorem follows immediately from paid (i), and paid (iii) is an 
immediate consequence of the definition of the Legendre symbol. 

Part (iv), which we will prove below, can also be recast as saying that 2 is a 
quadratic residue modulo p if and only if p = ±1 (mod 8). Thus, the quadratic 
residuosity of 2 modulo p is determined by the residue class of p modulo 8. 

Part (v), which we will also prove below, is the law of quadratic reciprocity. 
Note that when p = q, both ( p \ q ) and (q \ p) are zero, and so the statement of paid 
(v) is trivially true — the interesting case is when p ^ q. and in this case, part (v) 
is equivalent to saying that 

(p\q)(q\p) = (-l) P ^. 

Thus, the Legendre symbols ( p \ q) and {q \ p) have the same values if and only 
if either p = 1 (mod 4) or q = 1 (mod 4). As the following examples illustrate, 
this result also shows that for a given odd prime q, the quadratic residuosity of q 
modulo another odd prime p is determined by the residue class of p modulo either 
q or 4 q. 

Example 12.1. Let us characterize those primes p modulo which 5 is a quadratic 
residue. Since 5 = 1 (mod 4), the law of quadratic reciprocity tells us that 
(5 | p) = (p | 5). Now, among the numbers ±1, ±2, the quadratic residues 
modulo 5 arc ± I . It follows that 5 is a quadratic residue modulo p if and only if 
p = ±1 (mod 5). This example obviously generalizes, replacing 5 by any prime 
<7=1 (mod 4), and replacing the above congruences modulo 5 by appropriate 
congruences modulo q. □ 

Example 12.2. Let us characterize those primes p modulo which 3 is a quadratic 
residue. Since 3^1 (mod 4), we must be careful in our application of the law of 
quadratic reciprocity. First, suppose that p = 1 (mod 4). Then (3 | p) = (p | 3), 
and so 3 is a quadratic residue modulo p if and only if p = 1 (mod 3). Second, 
suppose that p ^ 1 (mod 4). Then (3 | p) = —{p | 3), and so 3 is a quadratic 
residue modulo p if and only if p = -1 (mod 3). Putting this all together, we see 
that 3 is quadratic residue modulo p if and only if 

p = 1 (mod 4) and p = 1 (mod 3) 


or 


p = — 1 (mod 4) and p = - 1 (mod 3). 

Using the Chinese remainder theorem, we can restate this criterion in terms of 
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residue classes modulo 12: 3 is quadratic residue modulo p if and only if p = 
±1 (mod 12). This example obviously generalizes, replacing 3 by any prime 
q = -1 (mod 4), and replacing the above congruences modulo 12 by appropriate 
congruences modulo 4 q. □ 

The rest of this section is devoted to a proof of parts (iv) and (v) of Theo- 
rem 12.1. The proof is completely elementary, although a bit technical. 

Theorem 12.2 (Gauss’ lemma). Let p be an odd prime and let a be an integer not 
divisible by p. Define ctj := ja mod p for j = 1, . . ., (p - l)/2, and let n be the 
number of indices j for which oq > p/2. Then (a \ p) = (-1)". 

Proof. Let r\ r„ denote the values otj that exceed p/2, and let si, ... , s/ t denote 

the remaining values aj. The r/s and s/s arc all distinct and non-zero. We have 
0 < p — r, < p/2 for z = 1, . . . , n, and no p - r, is an sy, indeed, if p - r, = sj, 
then Sj = — r t (mod p), and writing sj = ua mod p and /-, = va mod p, for some 

u,v = 1 (p- 1)/2, we have ua = -va (mod p), which implies u = —v (mod p), 

which is impossible. 

It follows that the sequence of numbers si, . . . , s k ,p — r \, . . . , p — r„ is just a 
reordering of 1, . . . , (p - l)/2. Then we have 

((p ~ l)/2)! = si ■ ■ ■ s k (-ri) ■ ■ ■ (~r n ) 

= (-l)"si • • • s k r x ■■■ r n 
= (-1 )"((p ~ l)/2) ! a (p ~ l)/2 (mod p), 

and canceling the factor ((p - l)/2)!, we obtain a ip ~ x ^ 2 = (-1)" (mod p), and the 
result follows from the fact that {a \ p) = a {p ~ ')/ 2 (mod p). □ 

Theorem 12.3. If p is an odd prime and gcd(n,2p) = 1, then (a \ p) = (-l) r 
where t = \ ^~ Vi a / P\- Also, (2 | p) = (-l) (p " -1) / 8 . 

Proof. Let a be an integer not divisible by p, but which may be even, and let us 
adopt the same notation as in the statement and proof of Theorem 12.2; in par- 
ticular, a i, . . . , aq,- 1 )/ 2 , n, . . . , r n , and s\,...,s k are as defined there. Note that 
ja = p\_ja/p\ + otj, for j = 1 (p - l)/2, so we have 

<j>- 1)/2 (P-D/2 n k 

Yj ja = Z + Z r i + Z s j- (12J) 

i = i j = i j = i i = i 

Moreover, as we saw in the proof of Theorem 12.2, the sequence of numbers 
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si, . . . , sic, P — r\, ■ ■ ■ , p — r n is a reordering of 1, . . . , (p — l)/2, and hence 

k n k 


(P- D/2 n 

E i = Y {p _ r j ] + Z ^ = ^ _ Z r J + z 


s». (12.2) 


;'=i v=i 


7=1 7=1 7=1 

Subtracting (12.2) from (12.1), we get 

(p-D/2 0-l)/2 n 

(o-i) Z J = p{ Z LW/ J J-«) + 2 Z i 


7=1 


7=1 


7=1 


Note that 


(P-D/2 2 , 

V • P ~ 1 

Z 2 = ^- 


7=1 


2 , (P-D/2 


which together with (12.3) implies 

_2 _ i 

(a - 1 Y = ^ - n (mod 2). 

7 = 1 

If a is odd, (12.5) implies 

(P-D/2 

n = Yj \-J a /P\ ( mod T >- 

7=1 


(12.3) 


(12.4) 


(12.5) 


(12.6) 


If a = 2, then [2j /p\ =0 for j = 1 ,...,(/?— l)/2, and (12.5) implies 

p 1 — 1 

n = — - — (mod 2). (12.7) 

The theorem now follows from (12.6) and (12.7), together with Theorem 12.2. □ 

Note that this last theorem proves paid (iv) of Theorem 12.1. The next theorem 
proves paid (v). 


Theorem 12.4. If p and q are distinct odd primes, then 

(p\q)(q\p) = (-l) P ^. 

Proof. Let S be the set of pairs of integers (x, y) with I < x < (p — I )/2 and 
1 < y < (q — l)/2. Note that S contains no pair (x,y) with qx = py, so let 
us partition S into two subsets: S\ contains all pairs (x, y) with qx > py, and 
S 2 contains all pairs (x, y) with qx < py. Note that (x, y) e S 1 if and only if 
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1 < x < (p - l)/2 and 1 < y < [qx/p}. So | | = [qx/pj. Similarly, 

1^2 1 = i;r i 1,/2 \_py/q\ ■ So we have 

_ 1 _ 1 (p- 1)/2 (q- 1)/2 

= |£| = 1*5*11 + is? | = ^ L<7 */pJ + 2 LwVlI, 

*=1 y=l 

and Theorem 12.3 implies 

(p\q)(q |p) = (-l)^^. □ 

Exercise 12.1. Characterize those odd primes p for which (15 | p) = 1, in terms 
of the residue class of p modulo 60. 

Exercise 12.2. Let p be an odd prime. Show that the following are equivalent: 

(a) (-2|p)= 1; 

(b) p = 1 or 3 (mod 8); 

(c) p = r 2 + 2 1 2 for some r,t e Z. 


12.2 The Jacobi symbol 

Let a,n be integers, where n is positive and odd, so that n = q\ ■ ■ ■ q^, where the 
<//’s are odd primes, not necessarily distinct. Then the Jacobi symbol ( a \ n ) is 
defined as 

(a | n ) := (a \ q { ) ■ ■ ■ {a \ q k ), 

where ( a \ q t ) is the Legendre symbol. By definition, (a \ 1) = 1 for all 
a e Z. Thus, the Jacobi symbol essentially extends the domain of definition of 
the Legendre symbol. Note that (a \ n) e {0, ±1}, and that (a \ n) = 0 if and only 
if gcd(a, n) > 1 . The following theorem su mm arizes the essential properties of the 
Jacobi symbol. 

Theorem 12.5. Let m, n be odd, positive integers, and let a,b eZ. Then we have: 

(i) ( ab | n ) = ( a \ n){b \ n ); 

(ii) ( a | mn) = (a | m)(a \ n); 

(iii) a = b (mod n) implies (a \ n) = (b \ n); 

(iv) (-1 | n) = (-1) (,I_1) / 2 ; 

(v) (2 | n) = (-I)*" 2-1 )/ 8 ; 

• /»— 1 n — 1 

(vi) (m | n) = (-1) 2 2 (« I m). 



12.2 The Jacobi symbol 


347 


Proof. Parts (i)-(iii) follow directly from the definition (exercise). 

For parts (iv) and (vi), one can easily verify (exercise) that for all odd integers 
ni,...,n k , 

k 

- l)/2 = (m ■■■n k - l)/2 (mod 2). 

(=i 

Part (iv) easily follows from this fact, along with part (ii) of this theorem and paid 
(i) of Theorem 12.1 (exercise). Paid (vi) easily follows from this fact, along with 
parts (i) and (ii) of this theorem, and paid (v) of Theorem 12.1 (exercise). 

For paid (v), one can easily verify (exercise) that for odd integers n\,...,n k , 
k 

- l)/8 = («!••• n\ - l)/8 (mod 2). 

i=i 

Paid (v) easily follows from this fact, along with paid (ii) of this theorem, and paid 
(iv) of Theorem 12.1 (exercise). □ 

As we shall see later, this theorem is extremely useful from a computational 
point of view — with it, one can efficiently compute ( a \ n), without having to 
know the prime factorization of either a or n. Also, in applying this theorem it is 
useful to observe that for all odd integers m, n, 

• (_ l)(«-fi/2 _ i ^ n = i ( mod 4). 

• (-1)!" 2 - 1 '/ 8 = 1 <=^> n = ± 1 (mod 8); 

• ( — i)(C"»— t)/ 2 )«:M— 1 )/ 2 ) _ | m = i ( mo d 4) or n = 1 (mod 4). 

Suppose a is a quadratic residue modulo n, so that a = b 2 (mod n), where 
gcd (a,n) = 1 = gcd (b,n). Then by parts (iii) and (i) of Theorem 12.5, we have 
( a | n) = ( b 2 | n) = (b \ n ) 2 = 1. Thus, if a is a quadratic residue modulo n, then 
(a | n) = 1. The converse, however, does not hold: (a \ n) = I does not imply that 
a is a quadratic residue modulo n (see Exercise 12.3 below). 

It is sometimes useful to view the Jacobi symbol as a group homomorphism. Let 
n be an odd, positive integer. Define the Jacobi map 

Jn : K -> {±1} 

[ah (a | n ). 

First, we note that by paid (iii) of Theorem 12.5, this definition is unambiguous. 
Second, we note that since gcd (a,n) = 1 implies (a \ n) = ±1, the image of /„ 
is indeed contained in {±1}. Third, we note that by paid (i) of Theorem 12.5, /„ 
is a group homomorphism. Since /„ is a group homomorphism, it follows that its 
kernel, Ker ./„, is a subgroup of Z*. 
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Exercise 12.3. Let n be an odd, positive integer, and consider the Jacobi map 
J ! - 

(a) Show that (Z*) 2 C Ker 

(b) Show that if n is the square of an integer, then Ker ./„ = Z*. 

(c) Show that if n is not the square of an integer, then [Z* : Ker /„] = 2 and 
[Ker /„ : (Z*) 2 ] = 2' “ 1 , where r is the number of distinct prime divisors of 
n. 


Exercise 12.4. Let p and q be distinct primes, with p = q = 3 (mod 4), and let 
n := pq. 

(a) Show that [ — 1 ]„ e Ker ./„ \ (Z*) 2 , and from this, conclude that the cosets 
of (Z*) 2 in Ker J n arc the two distinct cosets (Z*) 2 and [-1]„(Z*) 2 . 

(b) Let 5 e Z* \ Ker Show that the map from {0, 1 } x {0, 1 } x (Z*) 2 to Z* 
that sends ( a , b, y) to 8 a {—l) b y is a bijection. 


12.3 Computing the Jacobi symbol 

Suppose we are given an odd, positive integer n, along with an integer a, and we 
want to compute the Jacobi symbol ( a \ n). Theorem 12.5 suggests the following 
algorithm: 

a 4 - 1 

repeat 

// loop invariant: n is odd and positive 

a <— a mod n 
if a = 0 then 

if n = 1 then return a else return 0 

compute a', h such that a = 2 h a’ and a’ is odd 

if h ^ 0 (mod 2) and n ^ ±1 (mod 8) then a 4 a 

if a! ^ 1 (mod 4) and n ^ 1 (mod 4) then a 4 a 

(a,n) 4 - («,«') 
forever 

That this algorithm correctly computes the Jacobi symbol ( a \ n) follows directly 
from Theorem 12.5. Using an analysis similar to that of Euclid’s algorithm, one 
easily sees that the running time of this algorithm is 0(len(n) len(n)). 

Exercise 12.5. Develop a “binary” Jacobi symbol algorithm, that is, one that 
uses only addition, subtractions, and “shift” operations, analogous to the binary 
gcd algorithm in Exercise 4.6. 



12.4 Testing quadratic residuosity 


349 


Exercise 12.6. This exercise develops a probabilistic primality test based on the 
Jacobi symbol. For odd integer n > 1, define 

G„ := {a e Z* : a ( "~ 1)/2 = /„(«)}, 

where J n : Z* -» {±1} is the Jacobi map. 

(a) Show that G'„ is a subgroup of Z*. 

(b) Show that if n is prime, then G n = Z*. 

(c) Show that if n is composite, then G n C Z*. 

(d) Based on parts (a)-(c), design and analyze an efficient probabilistic pri- 
mality test that works by choosing a random, non-zero element a e Z„, and 
testing if a e G„. 

12.4 Testing quadratic residuosity 

In this section, we consider the problem of testing whether a is a quadratic residue 
modulo n, for given integers a and n, from a computational perspective. 


12.4.1 Prime modulus 

For an odd prime p, we can test if an integer a is a quadratic residue modulo p by 
either performing the exponentiation a h’ -1 )/ 2 mod p or by computing the Fegendre 
symbol ( a \ p). Assume that 0 < a < p. Using a standard repeated squaring 
algorithm, the former method takes time 0(len(p) 3 ), while using the Euclidean- 
like algorithm of the previous section, the latter method takes time 0(len( p) 2 ). So 
clearly, the latter method is to be preferred. 


12.4.2 Prime-power modulus 

For an odd prime p, we know that a is a quadratic residue modulo p e if and only 
if a is a quadratic residue modulo p (see Theorem 2.30). So this case immediately 
reduces to the previous one. 


12.4.3 Composite modulus 

For odd, composite n, if we know the factorization of n, then we can also determine 
if a is a quadratic residue modulo n by determining if it is a quadratic residue 
modulo each prime divisor p of n (see Exercise 2.39). However, without knowledge 
of this factorization (which is in general believed to be hard to compute), there is 
no efficient algorithm known. We can compute the Jacobi symbol ( a \ n); if this 
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is -1 or 0, we can conclude that a is not a quadratic residue; otherwise, we cannot 
conclude much of anything. 


12.5 Computing modular square roots 

In this section, we consider the problem of computing a square root of a modulo n, 
given integers a and n, where a is a quadratic residue modulo n. 


12.5.1 Prime modulus 

Let p be an odd prime, and let a be an integer such that 0 < a < p and (a \ p) = 1. 
We would like to compute a square root of a modulo p. Let a := [ a] p e Z*, so 
that we can restate our problem as that of finding /? e Z* such that ft 2 = a , given 
a e (Zp 2 . 

We first consider the special case where p = 3 (mod 4), in which it turns out that 
this problem can be solved very easily. Indeed, we claim that in this case 

P := a ( ^ +1) / 4 

is a square root of a — note that since p = 3 (mod 4), the number (p + l)/4 is an 
integer. To show that ff = a, suppose a = fi 2 for some fie Z*. We know that 
there is such a fi. since we arc assuming that a e (Zp 2 . Then we have 

p 2 = a (p+l)/ 2 = fi p+l =fi 2 = a , 

where we used Fermat’s little theorem for the third equality. Using a repeated- 
squaring algorithm, we can compute fi in time 0(len(p) 3 ). 

Now we consider the general case, where we may have p ^ 3 (mod 4). Here 
is one way to efficiently compute a square root of a, assuming we arc given, in 
addition to a, an auxiliary input ye Z* \ (Zp 2 (how one obtains such a y is 
discussed below). 

Let us write p— 1 = 2 h m, where m is odd. For every 8 e Z*, 8 m has multiplicative 
order dividing 2 h . Since a 2 '' >m = I , a m has multiplicative order dividing 2 /i_1 . 
Since y 2 ' m = — 1, y m has multiplicative order precisely 2 h . Since there is only 
one subgroup of Z* of order 2 h , it follows that y m generates this subgroup, and that 
a = y mx for some integer x, where 0 < x < 2 h and x is even. We can find x 
by computing the discrete logarithm of a m to the base y m , using the algorithm in 
§11.2.3. Setting k = y m */ 2 , we have 


We arc not quite done, since we now have a square root of a m , and not of a. 
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Since m is odd, we may write m = 2t + 1 for some non-negative integer t. It then 
follows that 

(Ka~') 2 = K 2 a~ 2 ' = a m a~ 2 ' = a m ~ 2t = a. 

Thus, Ka~‘ is a square root of a. 

Let us summarize the above algorithm for computing a square root of a e (Z*) 2 , 
assuming we are given y e Z* \ (Z*) 2 , in addition to a: 

compute positive integers m, h such that p — 1 = 2 h m with m odd 
y' <r- y m , a' <- a m 

compute x *- log ;/ , a' //note that 0 < x < 2 h and x is even 

j 3 <- (y')* /2 a _Lm/2 J 
output /? 

The work done outside the discrete logarithm calculation amounts to just a hand- 
ful of exponentiations modulo p, and so takes time 0(len(p) 3 ). The time to com- 
pute the discrete logarithm is ()(h lcn(/i) lent p) 2 ). So the total running time of this 
procedure is 

0(len(p) 3 + /7len(/?)len(/?) 2 ). 

The above procedure assumed we had at hand a non-square y. If h = 1, which 
means that p = 3 (mod 4), then (— 1 | p) = — 1, and so we arc done. However, we 
have already seen how to efficiently compute a square root in this case. 

If h > 1, we can find a non-square y using a probabilistic search algorithm. 
Simply choose y at random, test if it is a square, and if so, repeat. The proba- 
bility that a random element of Z* is a square is 1/2; thus, the expected number 
of trials until we find a non-square is 2; moreover, the running time per trial is 
0(lcn( p) 2 ), and hence the expected running time of this probabilistic search algo- 
rithm is 0(len(p) 2 ). 


12.5.2 Prime-power modulus 

Let p be an odd prime, let a be an integer relatively prime to p, and let e > 1 be 
an integer. We know that a is a quadratic residue modulo p e if and only if a is a 
quadratic residue modulo p. Suppose that a is a quadratic residue modulo p, and 
that we have found an integer b such that b 2 = a (mod p), using, say, one of the 
procedures described in §12.5.1. From this, we can easily compute a square root 
of a modulo p e using the following technique, which is known as Hensel lifting. 

More generally, suppose that for some / > 1, we have computed an integer 
b satisfying the congruence b 2 = a (mod //), and we want to find an integer c 
satisfying the congruence c 2 = a (mod // +1 ). Clearly, if c 2 = a (mod // +1 ), then 
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c 2 = a (mod pf), and so c = ±b (mod pf). So let us set c = b + pf h, and solve for 
h. We have 

c 2 = (b + pf h) 2 = b 2 + 2 bp? h + p 2 f h 2 = b 2 + 2 bpf h (mod // +1 ). 

So we want to find an integer h satisfying the linear congruence 

2bpf h = a - b 2 (mod // +1 ). (12.8) 

Since p \ 2b, we have gcd(2bpf, pf +l ) = pf . Furthermore, since b 2 = a (mod pf), 
we have pf \ (a — b 2 ). Therefore, Theorem 2.5 implies that (12.8) has a unique 
solution h modulo p, which we can efficiently compute as in Example 4.3. 

By iterating the above procedure, starting with a square root of a modulo p, we 
can quickly find a square root of a modulo p e . We leave a detailed analysis of the 
running time of this procedure to the reader. 


12.5.3 Composite modulus 

To find square roots modulo n, where n is an odd composite modulus, if we know 
the prime factorization of n, then we can use the above procedures for finding 
square roots modulo primes and prime powers, and then use the algorithm of the 
Chinese remainder theorem to get a square root modulo n. 

However, if the factorization of n is not known, then there is no efficient algo- 
rithm known for computing square roots modulo n. In fact, one can show that 
the problem of finding square roots modulo n is at least as hard as the problem of 
factoring n, in the sense that if there is an efficient algorithm for computing square 
roots modulo n, then there is an efficient (probabilistic) algorithm for factoring n. 

We now present an algorithm to factor n, using a modular square-root algorithm 
A as a subroutine. For simplicity, we assume that A is deterministic, and that for all 
n and for all a e (Z*) 2 , A(n, a) outputs a square root of a. Also for simplicity, we 
shall assume that n is of the form n = pq, where p and q are distinct, odd primes. 
In Exercise 12.15 below, you are asked to relax these restrictions. Our algorithm 
runs as follows: 

P £ Z,T, d <r- gcd(rep (/?), n) 
if d > 1 then 
output d 

else 

a *- p 2 , p' 4- A(n, a) 
if p = ±pr 

then output “failure” 

else output gcd(rep (P - P'), n) 
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Here, Z+ denotes the set of non-zero elements of Z„. Also, recall that rep (/?) 
denotes the canonical representative of /l. 

First, we argue that the algorithm outputs either “failure” or a non-trivial factor 
of n. Clearly, if p ^ Z*, then the value d computed by the algorithm is a non-trivial 
factor. So suppose /l E Z*. In this case, the algorithm invokes A on inputs n and 
a := p 2 , obtaining a square root ft' of a. Suppose that ft ^ ±/?\ and set y := ft — /?'. 
What we need to show is that gcd(rep(y), n) is a non-trivial factor of n. To see this, 
consider the ring isomorphism of the Chinese remainder theorem 

9 : Z„ -» Z p x TLq 
[a] n i r ([fl] p , [a\ q ). 

Suppose 6(P’) = (P \ , /It). Then the four square roots of a are 

P' = e-\p' v p' 2 ), -p' = e-\-p' v -p' 2 ), e-\-p\,p' 2 ), e-\p\,-p' 2 ). 

The assumption that p ^ ±p' implies that 9(P) = (~P\ - P'-,) or 9(P) = (P \ . ~P 2 ). In 
the first case, 9(y) = (-2^,0), which implies gcd(repfy), n) = q. In the second 
case, 9{y) = (0, -2/?' ), which implies gcdfrepfy), n) = p. 

Second, we argue that P[F] < 1/2, where F is the event that the algorithm 
outputs “failure.” Viewed as a random variable, p is uniformly distributed over 
Z^. Clearly, P[F | p £ Z*] = 0. Now consider any fixed a' e (Z*) 2 . Observe 
that the conditional distribution of p given that p 2 = a' is (essentially) the uniform 
distribution on the set of four square roots of a'. Also observe that the output of A 
depends only on n and p 2 . and so with respect to the conditional distribution given 
that p 2 = a ’ , the output p’ of A is fixed. Thus, 

P[F | p 2 = a’\ = P [P = ±p f | p 2 = a '] = 1/2. 

Putting everything together, using total probability, we have 

p[F] = p[f i p $ z;\ pip i z;j + 2 p t F l p 2 = a 'i p w 2 = a 'i 

a'e(Z*) 2 

= 0.P[P?Z*„1+ 2 \-P[p 2 = ct']<\. 

0t’e(Z*) 2 

Thus, the above algorithm fails to split n with probability at most 1 /2. If we like, 
we can repeat the algorithm until it succeeds. The expected number of iterations 
performed will be at most 2. 


Exercise 12.7. Let p be an odd prime, and let / e Z P [X] be a polynomial with 
0 < deg(/) < 2. Design and analyze an efficient, deterministic algorithm that 
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takes as input p, /, and an element of Z* \ (Z*) 2 , and which determines if / has 
any roots in Z p , and if so, finds all of the roots. Hint: see Exercise 7. 17. 

Exercise 12.8. Show how to deterministically compute square roots modulo 
primes p = 5 (mod 8) in time 0(len(p) 3 ). 

Exercise 12.9. This exercise develops an alternative algorithm for computing 
square roots modulo a prime. Let p be an odd prime, let (i e Z*, and set a := /l 2 . 
Define B a := {y e 7L P : y 2 - a e (Z*) 2 }. 

(a) Show that B a = {ye Z p : g(y) = 0}, where 

g := (X - /f)T-D/2 _ {X + /?)T-D/2 e Zp[X ]. 

(b) Let y e Z p \ B a , and suppose y 2 ^ a. Let p, v be the uniquely determined 
elements of Z p satisfying the polynomial congruence 

p + vX = (y - X)T-D/2 (mod x 2 _ a y 
Show that p = 0 and v~ 2 = a. 

(c) Using parts (a) and (b), design and analyze a probabilistic algorithm that 
computes a square root of a given a e (Z*) 2 in expected time 0(len(p) 3 ). 

Note that when p — 1 = 2 h m ( m odd), and h is large (e.g., h x len(p)/2), the 
algorithm in the previous exercise is asymptotically faster than the one in §12.5.1; 
however, the latter algorithm is likely to be faster in practice for the typical case 
where h is small. 

Exercise 12.10. Show that the following two problems are deterministic, poly- 
time equivalent (see discussion just above Exercise 11.10 in §11.3): 

(a) Given an odd prime p and a e (Z*) 2 , find /? g Z * such that fi 2 = a. 

(b) Given an odd prime p, find an element of Z* \ (Z*) 2 . 

Exercise 12.11. Design and analyze an efficient, deterministic algorithm that 
takes as input primes p and q , such that q \ (p — 1), along with an element a e Z*, 
and determines whether or not a e (Z*) 9 . 

Exercise 12.12. Design and analyze an efficient, deterministic algorithm that 
takes as input primes p and q, such that q \ (p — I ) but q~ { (p - 1), along with an 
element a e (Z*) 4 , and computes a qth root of a, that is, an element /I e Z* such 
that p q = a. 

Exercise 12.13. Design and analyze an algorithm that takes as input primes p 
and q, such that q \ (p — 1), along with an element a e (Z*) 9 , and computes a gth 
root of a. (Unlike Exercise 12.12, we now allow q 2 \ (p — 1).) Your algorithm may 
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be probabilistic, and should have an expected running time that is bounded by q 1 / 2 
times a polynomial in len(p). Hint: Exercise 4.13 may be useful. 


Exercise 12.14. Let p be an odd prime, y be a generator for Z*, and a be any 
element of Z*. Define 


B(p, y, a) 


f 1 if log y a > (p — l)/2; 
\ 0 if log r a <(p- l)/2. 


Suppose that there is an algorithm that efficiently computes B(p, y, a) for all p, y, a 
as above. Show how to use this algorithm as a subroutine in an efficient, proba- 
bilistic algorithm that computes log r a for all p, y, a as above. Hint: in addition to 
the algorithm that computes B. use algorithms for testing quadratic residuosity and 
computing square roots modulo p, and “read off” the bits of log r a one at a time. 


Exercise 12.15. Suppose there is a probabilistic algorithm A that takes as input 
a positive integer n, and an element a e (Z*) 2 . Assume that for all n, and for a 
randomly chosen a e (Z*) 2 , A computes a square root of a with probability at least 
0.001. Here, the probability is taken over the random choice of a and the random 
choices of A. Show how to use A to construct another probabilistic algorithm 
A' that takes n as input, runs in expected polynomial time, and that satisfies the 
following property: 

for all n, A' outputs the complete factorization of n into primes with 

probability at least 0.999. 


Exercise 12.16. Suppose there is a probabilistic algorithm A that takes as input 
positive integers n and m, and an element a e (Z*) m . It outputs either “failure,” 
or an with root of a. Furthermore, assume that A runs in expected polynomial 
time, and that for all n and m, and for randomly chosen a e (Z*) m , A succeeds 
in computing an /nth root of a with probability t in. m). Here, the probability is 
taken over the random choice of a, as well as the random choices made during the 
execution of A. Show how to use A to construct another probabilistic algorithm A' 
that takes as input n, m. and a e (Z*) m , runs in expected polynomial time, and that 
satisfies the following property: 

if t{n, m ) > 0.001, then/or all a e (Z*)'", A' computes an /nth root 

of a with probability at least 0.999. 


12.6 The quadratic residuosity assumption 

Loosely speaking, the quadratic residuosity (QR) assumption is the assumption 
that it is hard to distinguish squares from non-squares in Z*, where n is of the form 
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n = pq, and p and q arc distinct primes. This assumption plays an important role 
in cryptography. Of course, since the Jacobi symbol is easy to compute, for this 
assumption to make sense, we have to restrict our attention to elements of Ker 
where ./„ : Z* -> {±1} is the Jacobi map. We know that (Z*) 2 C Ker ./„ (see 
Exercise 12.3). Somewhat more precisely, the QR assumption is the assumption 
that it is hai'd to distinguish a random element in Ker ./„ \ (Z*) 2 from a random 
element in (Z*) 2 , given n (but not its factorization!). 

To give a rough idea as to how this assumption may be used in cryptography, 
assume that p = q = 3 (mod 4), so that [-1],, e Ker /„ \ (Z*) 2 , and moreover, 
Ker/„ \ (Z*) 2 = [-1]„(Z*) 2 (see Exercise 12.4). The value n can be used as a 
public key in a public-key cryptosystem (see §4.7). Alice, knowing the public key, 
can encrypt a single bit be {0, 1 } as /? := (-1 ) b a 2 , where Alice chooses a e Z* 
at random. The point is, if b = 0, then /? is uniformly distributed over (Z*) 2 , and 
if b = 1, then /I is uniformly distributed over Ker ,/„ \ (Z*) 2 . Now Bob, knowing 
the secret key, which is the factorization of n , can easily determine if /I e (Z*) 2 
or not, and hence deduce the value of the encrypted bit b. However, under the QR 
assumption, an eavesdropper, seeing just n and ft, cannot effectively figure out what 
b is. 

Of course, the above scheme is much less efficient than the RSA cryptosystem 
presented in §4.7, but nevertheless, has attractive properties; in particular, its secu- 
rity is very closely tied to the QR assumption, whereas the security of RSA is a bit 
less well understood. 


Exercise 12.17. Suppose that A is a probabilistic algorithm that takes as input n 
of the form n = pq, where p and q are distinct primes such that p = q = 3 (mod 4). 
The algorithm also takes as input a e Ker ,/„, and outputs either 0 or 1. Fur- 
thermore, assume that A runs in expected polynomial time. Define two random 
variables, X n and Y„, as follows: X n is defined to be the output of A on input n and 
a value a chosen at random from Ker /„\(Z*) 2 , and Y„ is defined to be the output of 
A on input n and a value a chosen at random from (Z*) 2 . In both cases, the value 
of the random variable is determined by the random choice of a, as well as the 
random choices made by the algorithm. Define tin) := |P[X„ = 1] - P[ Y n = 1]|. 
Show how to use A to design a probabilistic, expected polynomial time algorithm 
A' that takes as input n as above and a e Ker and outputs either “square” or 
“non-square,” with the following property: 

if e{n) > 0.001, then for all a e Ker the probability that A' 

correctly identities whether a e (Z*) 2 is at least 0.999. 


Hint: use the Chernoff bound. 
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Exercise 12.18. Assume the same notation as in the previous exercise. Define 
the random variable X' n to be the output of A on input n and a value a chosen 
at random from Ker/„. Show that | P[X', = 1] - P \Y„ = 1] | = e{n)/2. Thus, 
the problem of distinguishing Kcr ./„ from (Z*) 2 is essentially equivalent to the 
problem of distinguishing Ker J n \ (Z*) 2 from (Z*) 2 . 


12.7 Notes 

The proof we present here of Theorem 12.1 is essentially the one from Niven and 
Zuckerman [72]. Our proof of Theorem 12.5 follows closely the one found in Bach 
and Shallit [11]. 

Exercise 12.6 is based on Solovay and Strassen [99]. 

The probabilistic algorithm in §12.5.1 can be made deterministic under a gen- 
eralization of the Riemann hypothesis. Indeed, as discussed in §10.5, under such 
a hypothesis, Bach’s result [10] implies that the least positive integer that is not 
a quadratic residue modulo p is at most 2 log p (this follows by applying Bach’s 
result with the subgroup (Z*) 2 of Z*). Thus, we may find the required element 
y e Z* \ (Z*) 2 in deterministic polynomial time, just by brute-force search. The 
best unconditional bound on the smallest positive integer that is not a quadratic 
residue modulo p is due to Burgess [22], who gives a bound of p'‘ +o(]> , where 
a := 1/(4 \fe) k, 0.15163. 

Goldwasser and Micali [41] introduced the quadratic residuosity assumption to 
cryptography (as discussed in § 12.6). This assumption has subsequently been used 
as the basis for numerous cryptographic schemes. 
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Modules and vector spaces 


In this chapter, we introduce the basic definitions and results concerning modules 
over a ring R and vector spaces over a field F. The reader may have seen some 
of these notions before, but perhaps only in the context of vector spaces over a 
specific field, such as the real or complex numbers, and not in the context of, say, 
finite fields like Z p . 


13.1 Definitions, basic properties, and examples 

Throughout this section, R denotes a ring (i.e., a commutative ring with unity). 

Definition 13.1. An R-module is a set M together with an addition operation on 
M and a function p : R x M — »• M, such that the set M under addition forms an 
abelian group, and moreover, for all c,d e R and a, /I e M, we have: 

(i) p{c, p(d,a)) = p(cd,a); 

(ii) p{c + d, a) = p{c, a ) + p{d, a); 

(iii) p{c, a + p) = p(c, a ) + p(c, /?); 

(iv) p(l R ,a) = a. 

One may also call an jR-module M a module over R, and elements of R arc 
sometimes called scalars. The function p in the definition is called a scalar mul- 
tiplication map, and the value p{c, a) is called the scalar product of c and a. 
Usually, we shall simply write ca (or c • or) instead of p{c, a). When we do this, 
properties (i)-(iv) of the definition may be written as follows: 

c(da ) = ( cd)a , (c + d)a = ca + da, c{a + /?) = ca + cp, l R a = a. 

Note that there arc two addition operations at play here: addition in R (such as 
c + d) and addition in M (such as a + P). Likewise, there arc two multiplication 
operations at play: multiplication in R (such as cd ) and scalar multiplication (such 
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as ca). Note that by property (i), we may write cda without any ambiguity, as both 
possible interpretations, c(da) and ( cd)a , yield the same value. 

For fixed c G R, the map that sends a G M to ca G M is a group homomor- 
phism with respect to the additive group operation of M (by property (iii) of the 
definition); likewise, for fixed a G M, the map that sends c G R to ca G M is 
a group homomorphism from the additive group of R into the additive group of 
M (by property (ii)). Combining these observations with basic facts about group 
homomorphisms (see Theorem 6.19), we may easily derive the following basic 
facts about i?-modules: 

Theorem 13.2. If M is a module over R, then for all c e R, a e M, and k G Z, 
we have: 

(i) 0 R • a = 0 M ; 

(ii) c ■ Om = Om; 

(iii) (-c)a = -(ca) = c(-a); 

(iv) ( kc)a = k(ca) = c(ka). 

Proof. Exercise. □ 

An JCmodule M may be trivial, consisting of just the zero element Om- If R is 
the trivial ring, then any /C module M is trivial, since for every a G M, we have 
a = l^a = O^a = Om- 


Example 13.1. The ring R itself can be viewed as an /C module in the obvious 
way, with addition and scalar multiplication defined in terms of the addition and 
multiplication operations of R. □ 

Example 13.2. The set R x ”, which consists of all of n-tuples of elements of R, 
forms an /^-module, with addition and scalar multiplication defined component- 
wise: for a = (oi, . . . , a n ) G R xn , f = (b\, b„) G R xn , and c e R.we define 

a + p := (a\ + b\, . . . , a n + b n ) and ca := (ca i, . . . , ca n ). □ 

Example 13.3. The ring of polynomials -R[X] over R forms an /C module in the 
natural way, with addition and scalar multiplication defined in terms of the addition 
and multiplication operations of the polynomial ring. □ 

Example 13.4. As in Example 7.39, let / be a non-zero polynomial over R with 
lc(/) G R*. and consider the quotient ring E := R[X]/(f). Then E is a module 
over R, with addition defined in terms of the addition operation of E, and scalar 
multiplication defined by c[g] / := [c] / • [g]/ = [eg] /, for c e R and g G R[X]. □ 

Example 13.5. Generalizing Example 13.3, if E is any ring containing R as a 
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subring (i.e., E is an extension ring of R), then £ is a module over R, with addi- 
tion and scalar multiplication defined in terms of the addition and multiplication 
operations of E. □ 

Example 13.6. Any abelian group G, written additively, can be viewed as a Z- 
module, with scalar multiplication defined in terms of the usual integer multiplica- 
tion map (see Theorem 6.4). □ 

Example 13.7. Let G be any group, written additively, whose exponent divides n. 
Then we may define a scalar multiplication that maps [k] n e Z„ and a e G to ka. 
That this map is unambiguously defined follows from the fact that G has exponent 
dividing n, so that if k = k' (mod n), we have ka — k'a = (k — k')a = 0g, since 
n\(k — k'). It is easy to check that this scalar multiplication map indeed makes G 
into a Z„ -module. □ 

Example 13.8. Of course, viewing a group as a module does not depend on 
whether or not we happen to use additive notation for the group operation. If 
we specialize the previous example to the group G = Z*, where p is prime, then 
we may view G as a Z p _i -module. However, since the group operation itself is 
written multiplicatively, the "scalar product” of [k] p _i e Z p _i and a e Z* is the 
power a k . □ 

Example 13.9. If M\, . . . , are ^-modules, then so is their direct product 
M i x • • • x M; t, where addition and scalar product arc defined component- wise. If 
M = M ! = ••• = Mk, we write this as M xk . □ 

Example 13.10. If I is an arbitrary set, and M is an ^-module, then Map(/, M), 
which is the set of all functions /:/->• M, may be naturally viewed as an R- 
module, with point-wise addition and scalar multiplication: for f,ge Map(7, M) 
and c e R, we define 

(/ + g)(i) ■= f(i ) + g( 0 and (c/)(/) := cf(i) for all i el. □ 


13.2 Submodules and quotient modules 

Again, throughout this section, R denotes a ring. The notions of subgroups and 
quotient groups extend in the obvious way to ^-modules. 

Definition 13.3. Let M be an R-module. A subset N of M is a submodule (over 
R)of M if 

(i) N is a subgroup of the additive group M, and 

(ii) ca e N for all c e R and a e N (i.e., N is closed under scalar multiplica- 
tion). 
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It is easy to see that a submodule N of an .R-module M is also an JCmodule in 
its own right, with addition and scalar multiplication operations inherited from M. 

Expanding the above definition, we see that a non-empty subset IV of M is a 
submodule if and only if for all c 6 R and all a,p e N, we have 

a + P e N, —a e N, and ca e N. 

Observe that the condition —a e N is redundant, as it is implied by the condition 
ca e N with c = — Ir. 

Clearly, {0 m} and M are submodules of M. For k e Z, it is easy to see that 
not only are kM and M{k} subgroups of M (see Theorems 6.7 and 6.8), they arc 
also submodules of M. Moreover, for c e R, 

cM := {ca : a e M } and M{c} := {a e M : ca = 0 m} 

arc also submodules of M. Further, for a e M, 

Ra := {ca : c € R] 

is a submodule of M. Finally, if N\ and N 7 are submodules of M, then N\ + N 2 
and N\ P\ N 7 arc not only subgroups of M, they arc also submodules of M. We 
leave it to the reader to verify all these facts: they arc quite straightforward. 

Fet a\,...,ak e M. The submodule 

Ra 1 + • • • + Ra^ 

is called the submodule (over R) generated by a 1 , . . . , a^ . It consists of all R- 

linear combinations 

ciai + • • • + c k a k , 

where the c,’s arc elements of R. and is the smallest submodule of M that contains 
the elements a \, . . . , a k . We shall also write this submodule as (ai, . . . , a k )R ■ As 
a matter of definition, we allow k = 0, in which case this submodule is { 0 m } • 
We say that M is finitely generated (over R) if M = (a\, . . . , a k )R for some 
a\ a k e M. 

Example 13.11. For a given integer t > 0, define R\X ]</ to be the set of polyno- 
mials of degree less than £. The reader may verify that _R[X]<| is a submodule of 
the iCmodule R[X], and indeed, is the submodule generated by 1, X, ... , X ( ~ l . If 
( = 0, then this submodule is the trivial submodule { Or } . □ 

Example 13.12. Fet G be an abelian group. As in Example 13.6, we can view 
G as a Z-module in a natural way. Subgroups of G are just the same thing as 
submodules of G, and for ai, . . . , a k e G, the subgroup (a\, . . . , a k ) is the same as 
the submodule {a\ a k )%. □ 
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Example 13.13. As in Example 13.1, we may view the ring R itself as an R- 
module. With respect to this module structure, ideals of R are just the same thing 

as submodules of R. and for a\,...,ak e R. the ideal (aj a /) is the same as 

the submodule (a\,..., a^R. Note that for a e R. the ideal generated by a may 
be written either as aR , using the notation introduced in §7.3, or as Ra, using the 
notation introduced in this section. □ 

Example 13.14. If E is an extension ring of R. then we may view E as an R- 
module, as in Example 13.5. It is easy to see that every ideal of E is a submodule; 
however, the converse is not true in general. Indeed, the submodule R[X]</ of 
R[X] discussed in Example 13.1 1 is not an ideal of the ring i?[X], □ 

If N is a submodule of M, then in particular, it is also a subgroup of M, and 
we can form the quotient group M/N in the usual way (see §6.3), which consists 
of all cosets [a]jv, where a e M. Moreover, because N is closed under scalar 
multiplication, we can also define a scalar multiplication on M/N in a natural 
way. Namely, for c € R and a e M, we define 

c • [a] N := [ca]jv- 

As usual, one must check that this definition is unambiguous, which means that 
ca = ca' (mod N ) whenever a = a' (mod TV). But this follows (as the reader 
may verify) from the fact that N is closed under scalar multiplication. One can 
also easily check that with scalar multiplication defined in this way, M/N is an 
/A module; it is called the quotient module (over R) of M modulo N. 

Example 13.15. Suppose E is an extension ring of R. and I is an ideal of E. 
Viewing E as an R-module, I is a submodule of E, and hence the quotient ring 
E/I may naturally be viewed as an R-module, with scalar multiplication defined 
by c ■ [a]j := [ca]/ for c € R and a e E. Example 13.4 is a special case of this, 
applied to the extension ring i?[X] and the ideal (/). □ 


Exercise 13.1 . Show that if N is a submodule of an .R-module M, then a set 
P C JV is a submodule of M if and only if P is a submodule of N. 

Exercise 13.2 . Let M \ and M? be iNmodules, and let N\ be a submodule of 
Mi and TV? a submodule of M 2 . Show that A) x TV? is a submodule of Mi x Mi. 

Exercise 13.3. Show that if R is non-trivial, then the .R-module -R[X] is not 
finitely generated. 
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13.3 Module homomorphisms and isomorphisms 

Again, throughout this section, R is a ring. The notion of a group homomorphism 
extends in the obvious way to R-modules. 

Definition 13.4. Let M and M' be modules over R. An R-module homomor- 
phism from M to M' is a function p : M -> AT', such that 

(i) p is a group homomorphism from M to M' , and 

(ii) p{ca) = cp{a) for all c e R and a e M. 

An .R-module homomorphism is also called an R-linear map. We shall use 
this terminology from now on. Expanding the definition, we see that a map 
p : M — »• M' is an R-linear map if and only if p(a + j3) = p(a) + p{(3) and 
p{ca) = cp{a) for all a, /? e M and all c e R. 

Example 13.16. If N is a submodule of an R-module M, then the inclusion map 
i : N -» M is obviously an R-linear map. □ 

Example 13.17. Suppose N is a submodule of an R-module M. Then the natural 
map (see Example 6.36) 

p : M -> M/N 
a [a]jv 

is not just a group homomorphism, it is also easily seen to be an R-linear map. □ 

Example 13.18. Let M be an R-module, and let k be an integer. Then the k- 
multiplication map on M (see Example 6.38) is not only a group homomorphism, 
but it is also easily seen to be an R-linear map. Its image is the submodule kM. 
and its kernel the submodule M { k } . □ 

Example 13.19. Let M be an R-module, and let c be an element of R. The map 

p: M -> M 
a ca 

is called omultiplication map on M, and is easily seen to be an R-linear map 
whose image is the submodule cM , and whose kernel is the submodule M{c }. 
The set of all c e R for which cM = { 0 m } is called the R-exponent of M, and is 
easily seen to be an ideal of R. □ 

Example 13.20. Let M be an R-module, and let a be an element of M. The map 

p: R ^ M 


c i — > ca 
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is easily seen to be an .R-linear map whose image is the submodule Ra (i.e., the 
submodule generated by a). The kernel of this map is called the _R-order of a, and 
is easily seen to be an ideal of R. □ 

Example 13.21. Generalizing the previous example, let M be an L- module, and 
let a\ , . . . , a k be elements of M. The map 

p : R xk — <• M 

(ci, . . . , c k ) ciai + • • • + c k a k 

is easily seen to be an ^-linear map whose image is the submodule Ra\ + ■ ■ ■ + Ra k 
(i.e., the submodule generated by ai, . . . , a k ). □ 

Example 13.22. Suppose that M \ , . . . , M k are submodules of an L-module M. 
Then the map 

p : M\ x • • • x M k -> M 

(a\,...,a k ) aj + • • • + a k 

is easily seen to be an iGlinear map whose image is the submodule M\ +■ ■ -+M k . □ 

Example 13.23. Let E be an extension ring of R. As we saw in Example 13.5, 
E may be viewed as an R- module in a natural way. Let a e E, and consider the 
a-multiplication map on E, which sends /J e £ to ap e E. Then it is easy to see 
that this is an /L linear map. □ 

Example 13.24. Let E and E' be extension rings of R, which may be viewed as 
£L modules as in Example 13.5. Suppose that p : E -» E’ is a ring homomorphism 
whose restriction to R is the identity map (i.e., pic) = c for all c e R). Then p is an 
iLlinear map. Indeed, for every c e R and a, /? e E, we have p(a+/?) = p(a)+p(P) 
and p{ca) = p(c)p{a) = cp(a). □ 

Example 13.25. Let G and G' be abelian groups. As we saw in Example 13.6, 
G and G' may be viewed as Z-modules. In addition, every group homomorphism 
p : G -> G' is also a Z-linear map. □ 

Since an £L module homomorphism is also a group homomorphism on the under- 
lying additive groups, all of the statements in Theorem 6. 19 apply. In particular, an 
/^-linear map is injective if and only if the kernel is trivial (i.e., contains only the 
zero element). However, in the case of £L module homomorphisms, we can extend 
Theorem 6.19, as follows: 

Theorem 13.5. Let p : M -» M' be an R-linear map. Then: 

(i) for every submodule N of M, pi N) is a submodule of M'; in particular 
(setting N := M), Im p is a submodule of M 
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(ii) for every submodule N ' of M' , p 1 ( N r ) is a submodule of M'; in partic- 
ular (setting N' := {0 m 1 } ), Ker p is a submodule of M. 


Proof. Exercise. □ 

Theorems 6.20 and 6.21 have natural /T module analogs, which the reader may 
easily verify: 

Theorem 13.6. If p : M M’ and p’ : M 1 M" are R-linear maps, then so is 
their composition p’ o p : M -> M" . 

Theorem 13.7. Let p,- : M ->■ M[, for i = be R-linear maps. Then the 

map 

p : M -> Mj x • • • x M' 

(/?i (a), (a)) 

is an R-linear map. 

If an /^-linear map p : M -> M’ is bijective, then it is called an /^-module 
isomorphism of M with M’ . If such an /^-module isomorphism p exists, we say 
that M is isomorphic to M', and write M = M '. Moreover, if M = M', then p is 
called an l?-module automorphism on M. 

Theorems 6.22-6.26 also have natural /^-module analogs, which the reader may 
easily verify: 

Theorem 13.8. If p is an R-module isomorphism of M with M', then the inverse 
function p~ { is an R-module isomorphism of M’ with M. 


Theorem 13.9 (First isomorphism theorem). Let p : M -» M’ be an R-linear 
map with kernel K and image N'. Then we have an R-module isomorphism 

M/K = N'. 

Specifically, the map 

p : M/K -> M' 

[a] K p(a ) 

is an injective R-linear map whose image is N'. 

Theorem 13.10. Let p : M -» M' bean R-linear map. Then for every submodule 
N of M with N C Ker p, we may define an R-linear map 

p : M/N M’ 

[a] w p(a). 

Moreover, Im p = Im p, and p is injective if and only if N = Ker p. 
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Theorem 13.11 (Internal direct product). Let M be an R-module with submod- 
ules N u N 2 , where N\ n 7V 2 = { 0 \/ } . Then we have an R-module isomorphism 

N | x N 2 — N\ + N 2 

given by the map 

p : N\ x N 2 — > N\ + JV 2 
(aq, oq) 1 — or] + (X2- 

Theorem 13.12. Let M and M' be R-modules, and consider the R-module of 
functions Map(M, M') (see Example 13.10). Then 

Hoiiir(M, M') := {a e Map(M, M') : er is an R-linear map} 

is a submodule of Map(M, M'). 

Example 13.26. Consider again the R-module R\X\/(f) discussed in Exam- 
ple 13.4, where / e R[X] is of degree £ > 0 and lc (/) e R*. As an .R-module, 
R[X]/(f ) is isomorphic to R[X]<| (see Example 13.11). Indeed, based on the 
observations in Example 7.39, the map p : R[A]<^ -» R\X\/(f) that sends a 
polynomial g e R[X] of degree less than £ to [g]/ e R\X\/(f) is an isomorphism 
of R[X]<f with R\X\/(f ). Furthermore, R[X]<^ is isomorphic as an R-module to 
R xl . Indeed, the map p' : R[X]<^ — »• R xl that sends g = X/=o a ‘X‘ e R[X]<^« to 
(ao , . . • , at- 1 ) 6 R xl is an isomorphism of R[X]<^ with R xl . □ 

Exercise 13.4 . Verify that the “is isomorphic to” relation on R-modules is an 
equivalence relation; that is, for all R-modules Mi, M 2 , M 3 , we have: 

(a) Mi = M\\ 

(b) Mi = M 2 implies M 2 = Mi; 

(c) Mi = M 2 and M 2 = M 3 implies M 1 = M 3 . 

Exercise 13.5 . Let p,- : M, M', for i = l, . . . , k, be R-linear maps. Show 
that the map 

p : Mi x • • • x Mk Mj x • ■ * x M' k 

(a\,...,a k ) (p\(a\),...,p k (a k )) 

is an R-lineai - map. 

Exercise 13.6 . Let p : M — »• M' be an R-linear map, and let c e R. Show that 
p(cM ) = cp(M). 

Exercise 13.7 . Let p : M -» M' be an R-linear map. Let Abe a submodule of 
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M, and let r : N -> M' be the restriction of p to N. Show that r is an .R-linear 
map and that Ker t = Ker p n N. 

Exercise 13.8 . Suppose M \, . . . , M , t are .R-modules. Show that for each i = 
1, . . . , k, the projection map 7r, : M\ x • • • x A//. -> M, that sends to 

a, is a surjective iMinear map. 

Exercise 13.9 . Show that if M = Mi x M 2 for ft -modules M\ and M 2 , and 
/V| is a subgroup of M\ and N 2 is a subgroup of M 2 , then we have an ft -module 
isomorphism M/{N\ x AE) = M\/N\ x M 2 /AL. 

Exercise 13.10 . Let M be an i?-module with submodules A/j and AL. Show 
that we have an ft-modulc isomorphism ( N\ + Nj j/Ni = /V| /( /V| n /V 2 ). 

Exercise 13.11. Let M be an J?-module with submodules N\, N 2 , and A, where 
N 2 C N\. Show that (N\ n A)/(N 2 ni) is isomorphic to a submodule of Afi/AA. 

Exercise 13.12. Let p : M -> M' be an J?-lineai - map with kernel K. Let N be 
a submodule of M. Show that we have an .R-module isomorphism M/(N + K) = 
p(M)/p(N). 

Exercise 13.13. Let p : M -» M' be a surjective i?-lineai - map. Let S be the set 
of all submodules of M that contain Ker p, and let S' be the set of all submodules 
of M' . Show that the sets S and S' arc in one-to-one correspondence, via the map 
that sends AT e S to p(N) e S'. 

13.4 Linear independence and bases 

Throughout this section, R denotes a ring. 

Definition 13.13. Let M be an R-module, and let {a, }" =1 be a family of elements 
of M. We say that {n/}” =1 

(i) is linearly dependent (over R) if there exist c\, ... ,c n e R, not all zero, 
such that aa\ + ■ ■ ■ + c n a n = 0 Ml 

(ii) is linearly independent (over R) if it is not linearly dependent; 

(iii) spans M (over R) if for every a e M, there exist c\, . . . , c n e ft such that 
ciaj + • • • + c n a n = a; 

(iv) is a basis for M (over R) if it is linearly independent and spans M. 

The family {a, }" =1 always spans some submodule of M, namely, the submodule 
N generated by aj, . . . , a n . In this case, we may also call N the submodule (over 
R) spanned by {a, }” =1 . 

The family { a, } " =] may contain duplicates, in which case it is linearly dependent 
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(unless R is trivial). Indeed, if, say, a i = 0:2, then setting c\ := 1, 02 := -1, and 
C3 := • • • := c„ := 0, we have the linear relation 2”=i c i a i = 0m- 

If the family {a,}" =1 contains 0 m, then it is also linear dependent (unless R is 
trivial). Indeed, if, say, a\ = 0 m, then setting c\ := 1 and C2 :=•••:= c n := 0. we 
have the linear relation X/Li c ' a i = 0m- 

The family {«/} " =1 may also be empty (i.e., n = 0), in which case it is linearly 
independent, and spans the submodule {0m}- 

In the above definition, the ordering of the elements a\, . . . , a n makes no differ- 
ence. As such, when convenient, we may apply the terminology in the definition 
to any family { nr, } , e / , where I is an arbitrary, finite index set. 

Example 13.27. Consider the .R-module R xn . Define a\,...,a„ € R xn as follows: 

a\ := (1,0 ,..., 0), a 2 := (0, 1,0 0), ..., a„ := (0, ...,0, 1); 


that is, cij has a 1 in position / and is zero everywhere else. It is easy to see that 
{“/}"=! i s a basis for R xr '. Indeed, for all ci, . . . , c n 6 R , we have 

c\u 1 + • • • + c„a„ = (ci,...,c„), 

from which it is clear that {a,}" =1 spans R xn and is lineai'ly independent. The 
family {a, }" =1 is called the standard basis for R xn . □ 

Example 13.28. Consider the Z-module Z x3 . In addition to the standard basis, 
which consists of the tuples 

( 1 , 0 , 0 ), ( 0 , 1 , 0 ), ( 0 , 0 , 1 ), 


the tuples 


at := (1,1,1), a 2 := (0,1,0), a 3 := (2,0,1) 


also form a basis. To see this, first observe that for all ci, c 2 , 03, d\,d 2 , d 2 e Z, we 
have 

(d\, d 2 , d 3 ) = c\a\ + c 2 a 2 + c 3 or 3 


if and only if 

d\ = ci + 2 c 3 , d 2 = c\ + c 2 , and t/ 3 = c\ + c 3 . (13.1) 

If (13.1) holds with d\ = d 2 = d 3 = 0, then subtracting the equation ci + c 3 = 0 
from ci + 2c 3 = 0, we see that c 3 = 0, from which it easily follows that ci = c 2 = 0. 

o 

This shows that the family {a/}^ =1 is lineai'ly independent. To show that it spans 
Z x3 , the reader may verify that for any given d\,d 2 , d 2 e Z, the values 

ci := —d\ + 2t/ 3 , C2 := d\ + d 2 — 2 d 3 , c 3 := d\ - d 3 
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satisfy (13.1). 

The family of tuples (1, 1, 1), (0, 1,0), (1,0, 1) is not a basis, as it is linearly 
dependent: the third tuple is equal to the first minus the second. 

The family of tuples (1,0, 12), (0, 1,30), (0,0, 18) is linearly independent, but 
does not span Z x3 : the last component of any Z-linear combination of these tuples 
must be divisible by gcd(12, 30, 18) = 6. However, this family of tuples is a basis 
for the Q-module Q x3 . □ 

Example 13.29. Consider again the submodule 7?[X]«< of i?[X], where l > 0. 
consisting of all polynomials of degree less than t (see Example 13.11). Then 
{X- l } l i=l is a basis for J?[X]<£ over R. □ 

Example 13.30. Consider again the ring E = R[X]/(f), where / e R[X] with 
deg (/) = t > 0 and lc(/) e R* . As in Example 13.4, we may naturally view E as 
a module over R. From the observations in Example 7.39, it is clear that {2' _l } f . =[ 
is a basis for E over R. where | := [X] f e E. □ 

The next theorem highlights a critical property of bases: 

Theorem 13.14. If {<*,•}”_ i is a basis for an R-module M, then the map 

a : R xn -*■ M 

(ci,. . . , c n ) i r ciaq + • • • + c n a n 

is an R-module isomorphism. In particular, every element of M can be expressed 
in a unique way as c icq + • • • + c„a„, for c\,...,c n e R. 

Proof. We already saw that a is an K-linear map in Example 13.21. Since {a,} " =1 
is linearly independent, it follows that the kernel of a is trivial, so that a is injective. 
That a is surjective follows immediately from the fact that {a, }” =1 spans M. □ 

The following is an immediate corollary of this theorem: 

Theorem 13.15. Any two R-modules with bases of the same size are isomorphic. 

The following theorem develops an important connection between bases and 
linear maps. 

Theorem 13.16. Let {a,}" =1 be a basis for an R-module M, and let p : M — »• M' 
be an R-linear map. Then: 

(i) p is surjective if and only if {p(«, )}" =1 spans M' ; 

(ii) p is injective if and only if {p(a,)}" =1 is linearly independent; 

(iii) p is an isomorphism if and only if {p(a,)}" =1 is a basis for M’ . 
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Proof. By the previous theorem, we know that every element of M can be written 
uniquely as X/ c,®,, where the c,’s arc in R. Therefore, every element in Imp 
can be expressed as p(X,- c/«/) = X; c iP( a i)- It follows that Im p is equal to the 
subspace of M' spanned by {p(a,-)}”_r From this, (i) is clear. 

For (ii), consider a non-zero element X, c i a i of M, so that not all c,’s arc zero. 
Now, X, c/®( e Ker p if and only if X, c/p(a,) = 0 m', and thus, Ker p is non-trivial 
if and only if {/?(«/)} " =1 is linearly dependent. That proves (ii). 

(iii) follows from (i) and (ii). □ 

Exercise 13.14 . Let M be an R-module. Suppose {a ,}" =1 is a linearly inde- 
pendent family of elements of M. Show that for every J C { 1 ,...,«}, the sub- 
family { (ij } j€ j is also linearly independent. 

Exercise 13.15 . Suppose p : M — »• M' is an /El inear map. Show that if {®,}" =1 
is a linearly dependent family of elements of M, then {p(a,)}" =1 is also linearly 
dependent. 

Exercise 13.16 . Suppose p : M -» M’ is an injective /El incar map and that 
j is a linearly independent family of elements of M. Show that {p(«,)}” =| is 
linearly independent. 

Exercise 13.17 . Suppose that {a ,}'.' =1 spans an .R-module M and that p : M -> 
M’ is an R-linear map. Show that: 

(a) p is surjective if and only if {p(eq)}” =1 spans M'; 

(b) if {p(®/)}” = i is linearly independent, then p is injective. 


13.5 Vector spaces and dimension 

Throughout this section, F denotes a field. 

A module over a field is also called a vector space. In particular, an R-module 
is called an E-vector space, or a vector space over F. 

For vector spaces over E, one typically uses the terms subspace and quotient 
space, instead of (respectively) submodule and quotient module; likewise, one 
usually uses the terms E -vector space homomorphism, isomorphism and auto- 
morphism, as appropriate. 

We now develop the basic theory of dimension for finitely generated vector 
spaces. Recall that a vector space V over E is finitely generated if we have 
V = (a \, . . . , a „) f for some aq, . . . , a n of V. The main results here arc that 

• every finitely generated vector space has a basis, and 

• all such bases have the same number of elements. 
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Throughout the rest of this section, V denotes a vector space over F . We begin 
with a technical fact that will be used several times throughout this section: 

Theorem 13.17. Suppose that { a, } ” =1 is a linearly independent family of elements 
that spans a subspace W C V, and that ot n +\ e V \ W . Then { a, } "j^ 1 is also 
linearly independent. 

Proof. Suppose we have a linear relation 

(1 V — + ■ • * + C n (X n + C/a-ri , 

where the c, ’s are in F. We want to show that all the c,’s are zero. If c n+ \ f 0, then 
we have 

a n+\ = -c~lfc\a\ + • • • + c n a n ) e W, 

contradicting the assumption that a„ + \ ^ W . Therefore, we must have c n+ \ = 0, 
and the linear independence of {a,}" =1 implies that c\ = • • • = c n = 0. □ 

The next theorem says that every finitely generated vector space has a basis, and 
in fact, any family that spans a vector space contains a subfamily that is a basis for 
the vector space. 

Theorem 13.18. Suppose { a, } } is a family of elements that spans V. Then for 
some subset J C {1 the subfamily [ocj}j € j is a basis for V. 

Proof. We prove this by induction on n. If n = 0, the theorem is clear, so assume 
n > 0. Consider the subspace IV of V spanned by {a,}"^ 1 . By the induction 
hypothesis, for some K C { 1 1 } , the subfamily { a k } keK is a basis for W. 
There are two cases to consider. 

Case 1: a n e W . In this case, W — V . and the theorem clearly holds with 
/ := K. 

Case 2: a n W. We claim that setting / := K U {«}, the subfamily { n 7 } j€ j 
is a basis for V. Indeed, since [a k jkeK is linearly independent, and a n ^ W, 
Theorem 13.17 immediately implies that { a 7 } j€ j is linearly independent. Also, 
since {akikex spans W, it is clear that {aj}j € j spans W + (a n )p = V. □ 

Theorem 13.19. If V is spanned by some family of n elements of V, then every 
family of n+ 1 elements of V is linearly dependent. 

Proof. We prove this by induction on n. If n = 0, the theorem is clear, so assume 
that n > 0. Let {a,}” =1 be a family that spans V . and let {/?;}"+/ be an arbitrary 
family of elements of V . We wish to show that { ft, } 'i 1 ^ 1 is linearly dependent. 

We know that /!„+ 1 is a linear combination of the a/’s, say. 


Pn+l — Cl®l T " ■ ‘ T c n ot n . 


(13.2) 
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If all the c/’s were zero, then we would have p n+ \ = ()[/, and so trivially, {A}"=/ is 
linearly dependent. So assume that some c, is non-zero, and for concreteness, say 
c„ f 0. Dividing equation (13.2) through by c n , it follows that a n is an F-linear 
combination of a \, . . . , a„_ \,p n +\. Therefore, 

(a\, a„-i, p n+ i)p □ (ai,...,a„_i)F + (a n ) F = V. 

Now consider the subspace W := (P n +\ ) r and the quotient space V/W . Since 
the family of elements a n -\, p n+ \ spans V, it is easy to see that { I a,| m 

spans V/W; therefore, by induction, {[Pi\wY' i= \ is linearly dependent. This means 
that there exist d \, . . . , d n e F, not all zero, such that d\P\+- ■ ■+d l ,p n = 0 (mod W), 
which means that for some d n+ 1 e F. we have d\P\ + ■ ■ ■ + d n p n = d n+ \P n+ \. That 
proves that { // } is linearly dependent. □ 

An important corollary of Theorem 13.19 is the following: 

Theorem 13.20. If V is finitely generated, then any two bases for V have the same 
size. 

Proof. If one basis had more elements than another, then Theorem 13.19 would 
imply that the first basis was linearly dependent, which contradicts the definition 
of a basis. □ 

Theorem 13.20 allows us to make the following definition: 

Definition 13.21. If V is finitely generated, the common size of any basis is called 
the dimension of V, and is denoted di m / ( K) . 

Note that from the definitions, we have dini/TK) = 0 if and only if V is the 
trivial vector space (i.e., V = {Ok}). We also note that one often refers to a 
finitely generated vector space as a finite dimensional vector space. We shall give 
preference to this terminology from now on. 

To summarize the main results in this section up to this point: if V is finite 
dimensional, it has a basis, and any two bases have the same size, which is called 
the dimension of V. 

Theorem 13.22. Suppose that dinif(F) = n, and that {a,}" =1 is a family of n 
elements of V. The following are equivalent: 

(i) { a ,- } n . ! is linearly independent; 

(H) { } ” = , spans V; 

(iii) {a,}? j is a basis for V. 

Proof. Let W be the subspace of V spanned by { a, } j . 

First, let us show that (i) implies (ii). Suppose {a,} " =1 is linearly independent. 
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Also, by way of contradiction, suppose that W C V, and choose <x n+ \ e V \ W. 
Then Theorem 13.17 implies that is linearly independent. But then we 

have a linearly independent family of n + 1 elements of V, which is impossible by 
Theorem 13.19. 

Second, let us prove that (ii) implies (i). Let us assume that {ep}” =1 is linearly 
dependent, and prove that W C V . By Theorem 13.18, we can find a basis for 
W among the a/’s, and since {a,}" =1 is linearly dependent, this basis must con- 
tain strictly fewer than n elements. Hence, di m r(W) < dini/TK), and therefore, 
W C V. 

The theorem now follows from the above arguments, and the fact that, by defi- 
nition, (iii) holds if and only if both (i) and (ii) hold. □ 

We next examine the dimension of subspaces of finite dimensional vector spaces. 

Theorem 13 . 23 . Suppose that V is finite dimensional and W is a subspace of 
V. Then W is also finite dimensional, with dim/dlF ) < dim/TK). Moreover, 
diniff W) = dinipfL) if and only if W = V. 

Proof. Suppose dini/TK) = n. Consider the set S of all linearly independent 
families of the form { a, } ] , where m > 0 and each a, is in W. The set S is 
certainly non-empty, as it contains the empty family. Moreover, by Theorem 13.19, 
every member of S must have at most n elements. Therefore, we may choose some 
particular - element {a,}™^ of S, where m is as large as possible. We claim that this 
family { a; } "L , is a basis for W. By definition, is linearly independent 

and spans some subspace W' of W. If W' C W, we can choose an element 
a m+ i e W\W' , and by Theorem 13.17, the family { a,- } 1 is linearly independent, 
and therefore, this family also belongs to S, contradicting the assumption that m is 
as large as possible. 

That proves that W is finite dimensional with dim /TIL) < dini/(K). It remains 
to show that these dimensions are equal if and only if W = V. Now, if W = V, 
then clearly dimpf W) = dini/T V ). Conversely, if di m / (IV) = dim^fF), then by 
Theorem 13.22, any basis for W must already span V. □ 

Theorem 13 . 24 . If V is finite dimensional, and W is a subspace of V, then the 
quotient space V/W is also finite dimensional, and 

dim f (V/W) = dirnpfF) - dim F (W). 

Proof. Suppose that {«/} ” =1 spans V. Then it is clear - that { [ n, \ w 1 ” = | spans V/W. 
By Theorem 13.18, we know that V/W has a basis of the form { [ «, ] m ! - = p where 
i < n (renumbering the a/s as necessary). By Theorem 13.23, we know that W 
has a basis, say { flj } ”1 r The theorem will follow immediately from the following: 
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Claim. The elements 

/I],..., Pm (13.3) 

form a basis for V. 

To see that this family spans V, consider any element y of V . Then since 
{\a t \w\ | spans V/W, we have y = Yi c i a i (mod W) for some ci, . . . , q e F. If 
we set ft := y~Y, c i a i e W, then since {Pj}J =l spans W, we have p = Y, ^jPj f° r 
some d\, ... . d m € F, and hence y = Yi c i a i + Yj c h@j- That proves that the family 
of elements (13.3) spans V. To prove this family is linearly independent, suppose 
we have a relation of the form Yi c i a i + Yj djPj = where c\, . . . , c< e F 
and d\,...,d m e F. If any of the c, ’s were non-zero, this would contradict the 
assumption that {[a,]m}f =1 is linearly independent. So assume that all the c,’s arc 
zero. If any of the df s were non-zero, this would contradict the assumption that 
{Pj }" lj is linearly independent. Thus, all the c,’s and df s must be zero, which 
proves that the family of elements (13.3) is linearly independent. That proves the 
claim. □ 

Theorem 13.25. If V is Unite dimensional, then every linearly independent family 
of elements of V can be extended to form a basis for V. 

Proof. One can prove this by generalizing the proof of Theorem 13.18. Alterna- 
tively, we can adapt the proof of the previous theorem. Let {Pj}" L x be a linearly 
independent family of elements that spans a subspace W of V. As in the proof of 
the previous theorem, if {[a,]^} - =1 is a basis for the quotient space V/W, then the 
elements 

® 1 , • • • , 0t£ , Pi, ... , P m 

form a basis for V. □ 

Example 13.31. Suppose that F is finite, say IT 7 ! = q, and that V is finite dimen- 
sional, say dinv(F) = n. Then clearly \V\ = q n . If W is a subspace with 
AimpiyV) = m, then \ W\ = q m , and by Theorem 13.24, dinv(F/lL) = n — m, and 
hence \V /W\ = q n ~ m . Just viewing V and W as additive groups, we know that the 
index of W in V is [V : W] = \V/W\ = |L|/|JL| = q n ~ m , which agrees with the 
above calculations. □ 

We next consider the relation between the notion of dimension and linear maps. 
First, observe that by Theorem 13.15, if two finite dimensional vector spaces have 
the same dimension, then they arc isomorphic. The following theorem is the con- 
verse: 

Theorem 13.26. If V is of Unite dimension n, and V is isomorphic to V' , then V 
is also of Unite dimension n. 
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Proof. If { a , } ” = | is a basis for V, then by Theorem 13.16, {p(a,)}" =1 is a basis for 
V. □ 

Thus, two finite dimensional vector spaces arc isomorphic if and only if they 
have the same dimension. 

We next illustrate one way in which the notion of dimension is particularly use- 
ful. In general, if we have a function / : A -» B, injectivity does not imply 
surjectivity, nor does surjectivity imply injectivity. If A and B arc finite sets of 
equal size, then these implications do indeed hold. The following theorem gives us 
another important setting where these implications hold, with finite dimensionality 
playing the role corresponding to finite cardinality: 

Theorem 13.27. If p : V -» V is an F -linear map, and if V and V are finite 
dimensional with dini/AF) = dim f(V'), then we have: 

p is injective if and only if p is surjective. 

Proof. Let {a, } - ! =1 be a basis for V. Then 

p is injective {/?(a,)}” =] is linearly independent (by Theorem 13.16) 

<=> {p(«/)}" =1 spans V (by Theorem 13.22) 

<=> p is surjective (again by Theorem 13.16). □ 

This theorem may be generalized as follows: 

Theorem 13.28. If V is finite dimensional, and p : V -» V is an F -linear map, 
then Im p is a Unite dimensional vector space, and 

dinif (F) = dim /(I m p) + dini/TKer p). 

Proof. As the reader may verify, this follows immediately from Theorem 13.24, 
together with Theorems 13.26 and 13.9. □ 

Intuitively, one way to think of Theorem 13.28 is as a “law of conservation” for 
dimension: any “dimensionality” going into p that is not “lost” to the kernel of p 
must show up in the image of p. 

Exercise 13.18 . Show that if V\,...,V n arc finite dimensional vector spaces 
over F, then V\ x • • • x V„ has dimension X/Li dinif(F)- 

Exercise 13.19 . Show that if V is a finite dimensional vector space over F with 
subspaces W\ and Wi, then 

dim^flFi + Wf) - dinipflFi) + dim^JFi) - dim /A IF n Wf). 
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Exercise 13.20. From the previous exercise, one might be tempted to think that 
a more general “inclusion/exclusion principle” for dimension holds. Determine if 
the following statement is true or false: if V is a finite dimensional vector space 
over F with subspaces W \ , W 2 , and W 3 , then 

<\im F {W\ + W 2 + W 3 ) = d\m F (W\) + dim^flk?) + dirnp (W 3 ) 

— dinif (W\ fl W 2 ) - <F\m F (W\ n W 3 ) - dim F (W 2 n W 3 ) 

+ dim F {W\ fl W 2 n IT 3 ). 

Exercise 13.21 . Suppose that V and W arc vector spaces over F, V is finite 
dimensional, and {a, }f =1 is a linearly independent family of elements of V . In 

addition, let Pi /?/ c be arbitrary elements of W . Show that there exists an F- 

linear map p : V -> W such that p(ar,) = P, for i = 1 , . . . , /c. 

Exercise 13.22. Let V be a vector space over F with basis {a ,} " =1 . Let S' be a 
finite, non-empty subset of F, and define 

n 

B := | ^ c/a,- : ci, . . . , c„ e s|. 

(=1 

Show that if W is a subspace of V, with W C V, then | B n W\ < |S’|" _1 . 

Exercise 13.23. The theory of dimension for finitely generated vector spaces is 
quite elegant and powerful. There is a theory of dimension (of sorts) for modules 
over an arbitrary, non-trivial ring R, but it is much more awkward and limited. 
This exercise develops a proof of one aspect of this theory: if an /Lmodulc M has 
a basis at all, then any two bases have the same size. To prove this, we need the fact 
that any non-trivial ring has a maximal ideal (this was proved in Exercise 7.40 for 
countable rings). Let «, m be positive integers, let a \, . . . , a m be elements of R xn , 
and let I be an ideal of R. 

(a) Show that if [a , } spans R xn , then every element of I xn can be expressed 
as ciaj + • • • + c m a m , where ci, . . . , c m belong to I. 

(b) Show that if m > n and I is a maximal ideal, then there exist c\,. . .,c m e R, 
not all in /, such that ci aq + • • • + c m a m e I xn . 

(c) From (a) and (b), deduce that if m > «, then cannot be a basis for 

R xn . 

(d) From (c), conclude that any two bases for a given /T module M must have 
the same size. 
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In this chapter, we discuss basic definitions and results concerning matrices. We 
shall start out with a very general point of view, discussing matrices whose entries 
lie in an arbitrary ring R. Then we shall specialize to the case where the entries lie 
in a field F, where much more can be said. 

One of the main goals of this chapter is to discuss “Gaussian elimination,” which 
is an algorithm that allows us to efficiently compute bases for the image and kernel 
of an T-linear map. 

In discussing the complexity of algorithms for matrices over a ring R, we shall 
treat a ring R as an “abstract data type,” so that the running times of algorithms will 
be stated in terms of the number of arithmetic operations in R. If R is a finite ring, 
such as Z m , we can immediately translate this into a running time on a RAM (in 
later chapters, we will discuss other finite rings and efficient algorithms for doing 
arithmetic in them). 

If R is, say, the field of rational numbers, a complete running time analysis 
would require an additional analysis of the sizes of the numbers that appeal - in the 
execution of the algorithm. We shall not attempt such an analysis here — however, 
we note that all the algorithms discussed in this chapter do in fact run in poly- 
nomial time when R = Q, assuming we represent rational numbers as fractions in 
lowest terms. Another possible approach for dealing with rational numbers is to use 
floating point approximations. While this approach eliminates the size problem, it 
creates many new problems because of round-off errors. We shall not address any 
of these issues here. 


14.1 Basic definitions and properties 

Throughout this section, R denotes a ring. 

For positive integers m and n, an m x n matrix A over a ring R is a rectangular 
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array 

( an 

a i2 ■■■ 

a\n \ 

A = 

a 2 \ 

a 22 ■ ■ ■ 

Cl2n 


\^ml 

ami 

&mn / 


where each entry in the array is an element of R: the element is called the 
(i,j) entry of A, which we denote by A(i,j). For i = 1, . . . , m, the ith row of A is 

(fl,i , . . . , n !H ), 

which we denote by Row,(A), and for j = 1 the /th column of A is 

( QlJ ^ 

a 2j 

\ a mjJ 

which we denote by Col/ (A). We regard a row of A as a I x n matrix, and a column 
of A as an m x 1 matrix. 

The set of all m x n matrices over R is denoted by R'" x ". Elements of R 1 x " 
are called row vectors (of dimension n) and elements of R mxl are called col- 
umn vectors (of dimension m). Elements of R" xn are called square matrices (of 
dimension n ). We do not make a distinction between R lxn and R xn \ that is, we 
view standard n-tuples as row vectors. 

We can define the familial - operations of matrix addition and scalar multipli- 
cation: 

• If A, B e R mxn , then A + B is the m x n matrix whose (i, j) entry is 
MiJ) + B(i,j). 

• If c e R and A e R mxn , then cA is the m x n matrix whose (i,j) entry is 
cA(i,j). 

The m x n zero matrix is the m x n matrix, all of whose entries are 0 ^ ; we denote 
this matrix by 0'^ x " (or just 0, when the context is clear). 

Theorem 14.1. With addition and scalar multiplication as defined above, R mxn is 
an R-module. The matrix 0"^ x " is the additive identity, and the additive inverse of 
a matrix A e R mxn is the m x n matrix whose ( i,j ) entry is —A{i,j). 

Proof. To prove this, one first verifies that matrix addition is associative and com- 
mutative, which follows from the associativity and commutativity of addition in R. 
The claims made about the additive identity and additive inverses are also easily 
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verified. These observations establish that R mxn is an abelian group. One also has 
to check that all of the properties in Definition 13.1 hold. We leave this to the 
reader. □ 

We can also define the familial - operation of matrix multiplication: 

• If A e R mxn and B e R nxp , then AB is the m x p matrix whose (/, k) entry 
is 

n 

£ A(i,j)B(j,k ). 

7=1 

The n x n identity matrix is the matrix I e R nxn , where /(/,/) := I r and 
I{i,j) := Or for i f j. That is, I has Ir’s on the diagonal that runs from the 
upper left corner to the lower right corner, and Or’s everywhere else. 

Theorem 14.2. 

(i) Matrix multiplication is associative; that is, A(BC ) = ( AB)C for all 

A e R mxr,' B e R nxp > md Q £ R pxq 

(ii) Matrix multiplication distributes over matrix addition; that is, A(C + D) = 
AC+AD and ( A+B)C = AC+BC for all A, Be R‘ nxn and C,De R nxp . 

(iii) The n x n identity matrix I e R nxn acts as a multiplicative identity; that 
is, AI = A and IB = B for all A e R mxn and B e R nxm ; in particular, 
Cl = C = IC for all C e R nxn . 

(iv) Scalar multiplication and matrix multiplication associate; that is, c{AB) = 
(cA)B = A{cB) for all c e R, A e R mxn , and B e R nxp . 

Proof. All of these are trivial, except for (i), which requires just a bit of compu- 
tation to show that the (i, l) entry of both A{BC) and ( AB)C is equal to (as the 
reader may verify) 

2 A(i,j)B(j,k)C(kJ). □ 

\<k<p 

Note that while matrix addition is commutative, matrix multiplication in general 
is not. Indeed, Theorems 14.1 and 14.2 imply that R nxn satisfies all the properties 
of a ring except for commutativity of multiplication. 

Some simple but useful facts to keep in mind are the following: 

• If A e R mxn and B e R" xp , then the ft h row of AB is equal to vB , 
where v = Row, (A); also, the /cth column of AB is equal to Aw, where 
w = Co \ k (B). 
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• If A e R mxn and v = (ci, . . . , c m ) e R lxm , then 


vA = ^ c, Row, (A). 


In words: vA is a linear combination of the rows of A, with coefficients 
taken from the corresponding entries of v. 

If A e R mxn and 


6 R" x \ 


Aw = ^ dj Col, (A). 


In words: Aw is a linear combination of the columns of A, with coefficients 
taken from the corresponding entries of w. 

If A £ _R mXM , the transpose of A, denoted by A T , is defined to be the n x m 
matrix whose (J, i ) entry is A(i,j). 

Theorem 14.3. If A, B e R mxn , C e R nxp , and c e R, then: 

(i) (A + B) T = A T + B T , 

(ii) (cA) T = cA T ; 

( in ) (A T ) T = A; 

(iv) ( ACy = C T A T . 

Proof. Exercise. □ 

If A, is an n, x n j+ \ matrix, for i = 1, .... k, then by associativity of matrix 
multiplication, we may write the product matrix A\ ■ ■ ■ /l/ ( , which is an n\ x n^ + \ 
matrix, without any ambiguity. 

For an n x n matrix A, and a positive integer k, we write A k to denote the product 
A • • • A, where there are k terms in the product. Note that A 1 = A. We may extend 
this notation to /< = 0. defining A 0 to be the n x n identity matrix. One may readily 
verify the usual rules of exponent arithmetic: for all non-negative integers k,l, we 
have 


(A t ) lc = A Kl = (A k Y and A k A 1 = A k+l . 

It is easy also to see that paid (iv) of Theorem 14.3 implies that for all non-negative 
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integers k , we have 


(A k y = ( A T ) k . 


Algorithmic issues 

For computational purposes, matrices arc represented in the obvious way as arrays 
of elements of R. As remarked at the beginning of this chapter, we shall treat R as 
an "abstract data type,” and not worry about how elements of R arc actually rep- 
resented; in discussing the complexity of algorithms, we shall simply count “oper- 
ations in R,” by which we mean additions, subtractions, and multiplications; we 
shall sometimes also include equality testing and computing multiplicative inverses 
as “operations in R In any real implementation, there will be other costs, such 
as incrementing counters, and so on, which we may safely ignore, as long as their 
number is at most proportional to the number of operations in R. 

The following statements arc easy to verify: 

• We can multiply an m x n matrix by a scalar using mn operations in R. 

• We can add two m x n matrices using mn operations in R. 

• We can compute the product of an m x n matrix and an n x p matrix using 
0(mnp ) operations in R. 

It is also easy to see that given an nxn matrix A, and a non-negative integer e, we 
can adapt the repeated squaring algorithm discussed in §3.4 so as to compute A e 
using 0(len(e)) multiplications of nxn matrices, and hence 0(len(e)n 3 ) operations 
in R. 

Exercise 14.1 . Let A e R mxn . Show that if vA = 0*" for all v e R lxm , then 


14.2 Matrices and linear maps 

Let R be a ring. 

Lor positive integers m and n, consider the R-modules R 1 xm and R 1 x ". If A is 
an m x n matrix over R , then the map 

A a ‘ 1 x m 

v vA 

is easily seen to be an /^-linear map — this follows immediately from parts (ii) and 
(iv) of Theorem 14.2. We call A a the linear map corresponding to A. 



382 


Matrices 


If v — (ci,...,c m ) 6 R lxm , then 

m 

Aa{v) = vA = ^ c, Row, (A). 

/—i 

From this, it is clear that 

• the image of Aa is the submodule of R lx " spanned by (Row/fA)}"^; in 
particular, A a is surjective if and only if { Row, ( A) } ™ j spans R 1 x "; 

• Aa is injective if and only if { Row, ( A) }'" =l is linearly independent. 

There is a close connection between matrix multiplication and composition of 
corresponding linear maps. Specifically, let A e R mxn and B e R nxp , and consider 
the corresponding linear maps A a : R lxm -» R 1 x " and As : R lx " — » R 1 xp . Then 
we have 

Ab ° Aa = Aab ■ (14.1) 

This follows immediately from the associativity of matrix multiplication. 

We have seen how vector/matrix multiplication defines a linear map. Conversely, 
we shall now see that the action of any /Cl incar map can be viewed as a vec- 
tor/matrix multiplication, provided the R- modules involved have bases (which will 
always be the case for finite dimensional vector spaces). 

Let M be an R-module, and suppose that S = { a,- } ™ , is a basis for M, where 
m > 0. As we know (see Theorem 13.14), every element a e M can be written 
uniquely as ciaq + • • • + c m a m , where the c,’s arc in R. Let us define 

Vecs(a) := (cj, . . . , c m ) 6 R lxm 

We call Vecs(a) the coordinate vector of a relative to S. The function 

Vec 5 : M R lxm 

is an R-module isomorphism (it is the inverse of the isomorphism e in Theo- 
rem 13.14). 

Let N be another R-module, and suppose that T = {// 7 }” =| is a basis for N, 
where n > 0. Just as in the previous paragraph, every element ft e N has a unique 
coordinate vector Vccj(ft) e R lx " relative to T . 

Now let p : M -> N be an arbitrary R-linear map. Our goal is to define a matrix 
A e R mxn w ith the following property: 

Vec r(p(a)) = Vecs(a)A for all a e M. (14.2) 

In words: if we multiply the coordinate vector of a on the right by A, we get the 
coordinate vector of p{a). 
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Constructing such a matrix A is easy: we define A to be the matrix whose z'th 
row, for / = is the coordinate vector of p(otj) relative to T. That is, 

Row,(A) = Vec ripiotj)) for i = 1 ,m. 

Then for an arbitrary a e M, if (ci, . . . , c m ) is the coordinate vector of a relative to 

S, we have 

p{«) = p(X c '“') = 

i i 

and so 

Vec r{p{a)) = ^ c t Vecr(p(a,)) = ^ c/Row,(A) = Vec s(a)A. 

i i 

Furthermore, A is the only matrix satisfying (14.2). Indeed, if A' also satisfies 
(14.2), then subtracting, we obtain 

Vec 5 (a)(A - A’) = 0*" 

for all a e M. Since the map Vec 5 : M —* R lxm is surjective, this means that 
v( A - A') is zero for all v e R 1 xm , and from this, it is clear (see Exercise 14.1) that 
A — A' is the zero matrix, and so A = A! . 

We call the unique matrix A satisfying (14.2) the matrix of p relative to S and 

T, and denote it by Mats^lp). 

Recall that Honi/F M . N) is the R-module consisting of all R-linear maps from 
M to N (see Theorem 13.12). We can view Mat svr as a function mapping ele- 
ments of Hom«(M, N ) to elements of R mx ". 

Theorem 14.4. The function Mat^r : Horn r(M,N) — »• R mxn is an R-module 
isomorphism. In particular, for every A e R mxn , the pre-image of A under Mat s;/- 
is Vec“* oAa o Vccs , where A a : R lxm -> R 1 xn is the linear map coiresponding to 
A. 

Proof. To show that Mat svr is an R-linear map, let p, p' e Hoitir( M , N), and let 
c 6 R. Also, let A := Mats ,r(p) and A' := Mats,r(pO- Then for all a e M, we 
have 

Vec r((p + p')(a )) = Vec r(p(«) + p'(a)) = Vec r(p(«)) + Vec rip' {a)) 

= Vec s(oc)A + Vec s{oc)A' = Vecs(n)(A + A'). 

As this holds for all a e M, and since the matrix of a 1 i near map is uniquely 
determined, we must have Mats,r(P + p’) = A + A'. A si mi lar argument shows 
that Mat s,r(cp) = cA. This shows that Mats,r is an R-linear map. 

To show that the map Mats /- is injective, it suffices to show that its kernel is 
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trivial. If p is in the kernel of this map, then setting A := 0'^ x " in (14.2), we see 
that Vecr(p(a)) is zero for all a e M. But since the map Veer : N -> R 1 xn is 
injective, this implies p(a) is zero for all a e M. Thus, p must be the zero map. 

To show surjectivity, we show that every A e /T" x ” has a pre-image under 
Mat svr as described in the statement of the theorem. So let A be an m x n matrix, 
and let p := Vec ” 1 od A o Vecs. Again, since the matrix of a linear map is uniquely 
determined, it suffices to show that (14.2) holds for this particular A and p. For 
every a e M, we have 

Vec T (p(a)) = Ve C r(Vec- 1 (2 A (Ye C 5 (a)))) = 2 A (Vec 5 (a)) 

= Vecs(a)A. 

That proves the theorem. □ 

As a special case of the above, suppose that M = R 1 xm and N = R lxn , and S 
and T are the standard bases for M and N (see Example 13.27). In this case, the 
functions Vec $ and Vec r are the identity maps, and the previous theorem implies 
that the function 

A : R mxn -> Horn R (R lxm , R lxn ) 

Al i — > 

is the inverse of the function Mats 7 - : Horn n(R lxm , R lxn ) —> R' nxr '. Thus, the 
function A is also an /Tmodulc isomorphism. 

To summarize, we see that an /^-linear map p from M to N, together with 
particular bases for M and N, uniquely determine a matrix A such that the action 
of multiplication on the right by A implements the action of p with respect to the 
given bases. There may be many bases for M and N to choose from, and different 
choices will in general lead to different matrices. Also, note that in general, a basis 
may be indexed by an arbitrary finite set; however, in defining coordinate vectors 
and matrices of linear maps, the index set must be ordered in some way. In any 
case, from a computational perspective, the matrix A gives us an efficient way to 
compute the map p, assuming elements of M and N are represented as coordinate 
vectors with respect to the given bases. 

We have taken a “row-centric” point of view. Of course, if one prefers, by simply 
transposing everything, one can equally well take a “column-centric” point of view, 
where the action of p corresponds to multiplication of a column vector on the left 
by a matrix. 

Example 14.1. Consider the quotient ring E = R[X\/(f), where / e R\X ] with 
deg (/) = £ > 0 and lc(/) e R*. Let | := [A]/ e E. As an iCmodule, E 
has a basis S := {<f -1 }- = i (see Example 13.30). Let p : E -» E be the |- 
multiplication map, which sends a e E to |a 6 E. This is an iClinear map. If 
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/ = co + c\X + ■ ■ ■ + C(-]X ( 1 + C(X f , then the matrix of p relative to S is the l x l 
matrix 


/ 0 1 0 

0 0 1 


0 ^ 

0 


A = 


| 0 0 0 

\-co/q —c\ / C( -c 2 /q 


1 

— Q-l / c t / 


where for i = I — I , the / th row of A contains a 1 in position i + 1, and is 
zero everywhere else. The matrix A is called the companion matrix of /. □ 


Example 14.2. Let x\, . . . ,x k £ R. Let i?[X]<fc be the set of polynomials g e R[X ] 
with deg(g) < k, which is an .R-module with a basis S := { X ' — 1 } j (see Exam- 
ple 13.29). The multi-point evaluation map 

p: R[X] <k ^R ixk 

g (g(xi), ■ ■ ■ , g(x k )) 


is an /^-linear map. Let T be the standard basis for R lxk . Then the matrix of p 
relative to S and T is the k x /< matrix 


A = 


( 


\ 


1 1 ••• 1 


X\ 

X 2 

Xk 

2 

2 

2 

*1 

X 2 

• X k 


,k-\ 

k - 1 

k— 

■1 

X 2 

•• x k 


\ 




The matrix A is called a Vandermonde matrix. □ 


Exercise 14.2 . Let a : M -> N and r : N — »• P be J?-lineai - maps, and suppose 
that M. N, and P have bases S, T, and If, respectively. Show that 

Mat$;f/-(T o er) = Mats^fcr) • Matr/t/dr). 

Exercise 14.3. Let The a vector space over a field F with basis S = { a, } "1 1 . 
Suppose that U is a subspace of V of dimension £ < m. Show that there exists 
a matrix A e p mx(m ~ l} such that for all a e V. we have a e U if and only if 
Vecs(a)A is zero. Such a matrix A is called a parity check matrix for U. 

Exercise 14.4. Let F be a finite field, and let A be a non-zero m x n matrix over 
F. Suppose one chooses a vector v e F lxm at random. Show that the probability 
that vA is the zero vector is at most 1/|.F|. 
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Exercise 14.5. Design and analyze a probabilistic algorithm that takes as input 
matrices A, B.C e Z™ xm , where p is a prime. The algorithm should run in time 
0(m 2 lent p) 2 ) and should output either “yes” or “no” so that the following holds: 

• if C = A B, then the algorithm should always output “yes”; 

• if C ^ A B, then the algorithm should output “no” with probability at least 
0.999. 


14.3 The inverse of a matrix 

Let R be a ring. 

For a square matrix A e R nxn , we call a matrix B e R nxn an inverse of A if 
BA = A B = I, where I is the n x n identity matrix. It is easy to see that if A has 
an inverse, then the inverse is unique: if B and C are inverses of A, then 

B = BI = B(AC ) = ( BA)C = IC = C. 

Because the inverse of A is uniquely determined, we denote it by A -1 . If A has an 
inverse, we say that A is invertible, or non-singular. If A is not invertible, it is 
sometimes called singular. We will use the terms “invertible” and “not invertible.” 
Observe that A is the inverse of A -1 ; that is, (A -1 ) -1 = A. 

If A and B are invertible n x n matrices, then so is their product: in fact, it is 
easily verified that (/IB) -1 = l? -1 A -1 . It follows that if A is an invertible matrix, 
and k is a non-negative integer, then A k is invertible with inverse (A -1 )*, which 
we also denote by A~ k . 

It is also easy to see that A is invertible if and only if the transposed matrix A T 
is invertible, in which case (A T ) -1 = (A -1 ) T . Indeed, AB = I = BA holds if and 
only if B T A T = I = A T BE 

We now develop a connection between invertible matrices and B-modulc iso- 
morphisms. Recall from the previous section the B-modulc isomorphism 

A : R nxn Horn R {R ixn , R lx ") 

A i — > Aa<> 

where for each A e R nxn , A ,\ is the corresponding B-lincar map 

: R lxn R lx " 

V vA. 

Evidently, 4/ is the identity map. 

Theorem 14.5. Let A e B" x ". and let A a : B 1 x " — <• R [xn be the corresponding 
R-linear map. Then A is invertible if and only if A a is bijective, in which case 
A A~ l = ^A ■ 
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Proof. Suppose A is invertible, and that B is its inverse. We have AB = BA = /, 
and hence Aab = Ab a = 4j, from which it follows (see (14.1)) that Ab ° Aa = 
A a o As = Ai. Since Ai is the identity map, this implies A a is bijective. 

Suppose Aa is bijective. We know that the inverse map A~ l is also an R-linear 
map, and since the mapping A above is surjective, we have A” 1 = Ab for some 
B g R" y ". Therefore, we have Ab ° Aa = Aa ° Ab = A/, and hence (again, 
see (14.1)) Aab = Aba = A/. Since the mapping A is injective, it follows that 
AB = BA = I. This implies A is invertible, with A -1 = B. □ 

We also have: 

Theorem 14.6. Let A 6 R" x ". The following are equivalent: 

(i) A is invertible; 

(ii) {Row,-(A)}” =1 is a basis for R 1 x "; 

(iii ) { Col/ ( A) } ” = j is a basis for R n x 1 . 

Proof. We first prove the equivalence of (i) and (ii). By the previous theorem, 
A is invertible if and only if A a is bijective. Also, in the previous section, we 
observed that A a is surjective if and only if { Row,( A)}” =] spans R lxn , and that A a 
is injective if and only if {Row,(A)}" =| is linearly independent. 

The equivalence of (i) and (iii) follows by considering the transpose of A. □ 

Exercise 14.6 . Let R be a ring, and let A be a square matrix over R. Let us call 
B a left inverse of A if BA = I, and let us call C a right inverse of A if AC = I. 

(a) Show that if A has both a left inverse B and a right inverse C, then B = C 
and hence A is invertible. 

(b) Assume that R is a field. Show that if A has either a left inverse or a right 
inverse, then A is invertible. 

Note that part (b) of the previous exercise holds for arbitrary rings, but the proof 
of this is non-trivial, and requires the development of the theory of determinants, 
which we do not cover in this text. 

Exercise 14.7. Show that if A and B arc two square matrices over a field such 
that their product AB is invertible, then both A and B themselves must be invert- 
ible. 

Exercise 14.8. Show that if A is a square matrix over an arbitrary ring, and A k 
is invertible for some k > 0, then A is invertible. 

Exercise 14.9. With notation as in Example 14.1, show that the matrix A is 
invertible if and only if cq g R*. 
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Exercise 14.10. With notation as in Example 14.2, show that the matrix A is 
invertible if and only if Xj - xj e R* for all i ^ j. 


14.4 Gaussian elimination 

Throughout this section, F denotes a field. 

A matrix B e f mxn j s said t Q be in reduced row echelon form if there exists a 
sequence of integers (p \, . . . , p r ), with 0 < r < m and 1 < p\ < pi < ■ ■ ■ < p r < «, 
such that the following holds: 

• for i = 1, . . . , r, all of the entries in row i of B to the left of entry (/, p t ) arc 
zero; that is, B(i,j ) = Op for j = 1, . . . , p, - 1; 

• for i = 1 r, all of the entries in column of B above entry (i,pi) arc 

zero; that is, B(i', p t ) = 0 F for i' = 1, — 1; 

• for / = 1, . . . , r, we have B(i, p , ) = If ; 

• all entries in rows r + of B are zero; that is, B(i,j ) = Of for 

i = r + 1 , . . . , m and j = 1 

It is easy to see that if B is in reduced row echelon form, then the sequence 
(pi , . . . , above is uniquely determined, and we call it the pivot sequence of B. 
Several further remarks are in order: 

• All of the entries of B arc completely determined by the pivot sequence, 

except for the entries ( i , j ) with 1 <i<r and j > pi with j £ { p l+ \ p r } , 

which may be arbitrary. 

• If B is an n x n matrix in reduced row echelon form whose pivot sequence 
is of length «, then B must be the n x n identity matrix. 

• We allow for an empty pivot sequence (i.e., r = 0), which will be the case 
precisely when B = 0™ x ”. 

Example 14.3. The following 4x6 matrix B over the rational numbers is in reduced 
row echelon form: 

/0 1 -2 0 0 3 \ 

0 0 0 1 0 2 

“ 0 0 0 0 1 -4' 

\0 0 0 0 0 0 / 

The pivot sequence of B is (2,4,5). Notice that the first three rows of B form a 
linearly independent family of vectors, that columns 2, 4, and 5 form a linearly 
independent family of vectors, and that all of other columns of B arc linear com- 
binations of columns 2, 4, and 5. Indeed, if we truncate the pivot columns to their 
first three rows, we get the 3 x 3 identity matrix. □ 
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Generalizing the previous example, if a matrix is in reduced row echelon form, 
it is easy to deduce the following properties, which turn out to be quite useful: 

Theorem 14.7. If B is a matrix in reduced row echelon form with pivot sequence 
(pi,...,p r ), then: 

(i) rows 1,2 r of B form a linearly independent family of vectors; 

(ii) columns p\, . . . , p r of B form a linearly independent family of vectors, and 

all other columns of B can be expressed as linear combinations of columns 
Pi Pr- 

Proof Exercise — just look at the matrix! □ 

Gaussian elimination is an algorithm that transforms a given matrix A e F"' x " 
into a matrix B e F mxn , where B is in reduced row echelon form, and is obtained 
from A by a sequence of elementary row operations. There are three types of 
elementary row operations: 

Type I: swap two rows; 

Type II: multiply a row by a non-zero scalar; 

Type III: add a scalar multiple of one row to a different row. 

The application of any specific elementary row operation to an m x n matrix 
C can be affected by multiplying C on the left by a suitable m x m matrix X. 
Indeed, the matrix X corresponding to a particular - elementary row operation is 
simply the matrix obtained by applying the same elementary row operation to the 
mx m identity matrix. It is easy to see that for every elementary row operation, the 
corresponding matrix X is invertible. 

We now describe the basic version of Gaussian elimination. The input is an m x n 
matrix A, and the algorithm is described in Fig. 14.1. 

The algorithm works as follows. First, it makes a copy B of A (this is not neces- 
sary if the original matrix A is not needed afterwards). The algorithm proceeds col- 
umn by column, starting with the left-most column, so that after processing column 
j, the first j columns of B are in reduced row echelon form, and the current value 
of r represents the length of the pivot sequence. To process column j, in steps 3-6 
the algorithm first searches for a non-zero element among B(r + 1 ,j ), . . . , B(m,j); 
if none is found, then the first j + 1 columns of B are already in reduced row 
echelon form. Otherwise, one of these non-zero elements is selected as the pivot 
element (the choice is arbitrary), which is then used in steps 8-13 to bring column 
j into the required form. After incrementing r, the pivot element is brought into 
position (r, j), using a Type I operation in step 9. Then the entry (/-, /) is set to 1 p, 
using a Type II operation in step 10. Finally, all the entries above and below entry 
(i r,j ) are set to Of, using Type III operations in steps 11-13. Note that because 
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1. B ^ A, r ■*- 0 

2. for j «- 1 to n do 

3. t <- 0, i <- r 

4. while i - 0 and i < m do 

5. i <r- i + 1 

6. if B(i,j ) ± Of then l <— i 

1. if l 0 then 

8. r <- r + 1 

9. swap rows r and i of B 

10. Row r (i?) *- B(r,j)~ l Row r (B) 

1 1. for i ^ I to m do 

12. if i ^ r then 

13. Row j(B) <— Row j(B) - B(i,j)Row r {B) 

14. output B 


Fig. 14. 1. Gaussian elimination 


columns 1, . . . , j — 1 of B were already in reduced row echelon form, none of these 
operations changes any values in these columns. 

As for the complexity of the algorithm, it is easy to see that it performs O(mn) 
elementary row operations, each of which takes ()(n) operations in F, so a total of 
0{mn 2 ) operations in F. 


Example 14.4. Consider the execution of the Gaussian elimination algorithm on 
input 


/[ 0] 

[1] 

[1] 

A = [2] 

[1] 

[2] 

\[2] 

[2] 

[0] 




3 x 3 

3 


After copying A into B. the algorithm transforms B as follows: 


[0] [1] [1]\ /[ 2] [1] [2]\ /[l] [2] [1]\ 

[ 2 ] [ 1 ] [ 2 ] [ 0 ] [ 1 ] [ 1 ] RoW|K21RoW| > [ 0 ] [ 1 ] [ 1 ] 

[ 2 ] [ 2 ] [ 0 ]/ \[ 2 ] [ 2 ] [ 0 ]/ \[ 2 ] [ 2 ] [ 0 ]/ 


R0W3 R0W3 —[ 2 ] Rowi 
> 


/ [1] 

[ 0 ] 

\[ 0 ] 


[ 2 ] 

[ 1 ] 

[ 1 ] 



Rowi«-Rowi —[ 2 ] R0W2 
> 


/ [1] 

[ 0 ] 

\[ 0 ] 


[ 0 ] 

[ 1 ] 

[1] 


[ 2 ] 

[1] 

[1] 
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/[l] [0] [2] 

——————— ~ ROW- , [0] [!] [1] 

\[ 0 ] [ 0 ] [ 0 ] 

Suppose the Gaussian elimination algorithm performs a total of t elementary 
row operations. Then as discussed above, the application of the eth elementary 
row operation, for e = 1 , . . . , t, amounts to multiplying the current value of the 
matrix B on the left by a particular invertible m x m matrix X e . Therefore, the final 
output value of B satisfies the equation 

B = XA where X = X, X,- 1 • • • X x . 

Since the product of invertible matrices is also invertible, we see that X itself is 
invertible. 

Although the algorithm as presented does not compute the matrix X, it can 
be easily modified to do so. The resulting algorithm, which we call extended 
Gaussian elimination, is the same as plain Gaussian elimination, except that we 
initialize the matrix X to be the m x m identity matrix, and we add the following 
steps: 

• just before step 9: swap rows r and l of A; 

• just before step 10: Row,. (A) <- jB(r,y') -1 Row, .(A); 

• just before step 13: Row, (A) <- Row, (A) - B{i,j) Row, .(A). 

At the end of the algorithm we output A in addition to B. 

So we simply perform the same elementary row operations on A that we perform 
on B. The reader may verify that the above algorithm is correct, and that it uses 
0{mn{m + n)) operations in F. 

Example 14.5. Continuing with Example 14.4, the execution of the extended 
Gaussian elimination algorithm initializes A to the identity matrix, and then trans- 
forms A as follows: 



[ 1 ] [ 0 ] [ 0 ]\ 
[0] [1] [0] 
[ 0 ] [ 0 ] [1 ]/ 


Row] <->RoW2 
» 


/[ 0 ] 

[1] 

\[ 0 ] 


[ 1 ] 

[ 0 ]\ 

m 

[2] 

[ 0 ] 

[ 0 ] 

[ 0 ] 

Rowi^[2]Row! I 

» [ 1 ] 

[ 0 ] 

[ 0 ] 

[ 0 ] 


\[ 0 ] 

[ 0 ] 

[ 1 ] 


R0W3 R0W3 —[ 2 ] Rowi 
> 


[ 0 ] [ 2 ] [ 0 ]\ 
[ 1 ] [ 0 ] [ 0 ] 
[ 0 ] [ 2 ] [1 ]/ 


Rowj Rowi —[2] R 0 W 2 
» 


/[l] [2] [0]\ 
[1] [0] [0] 
\[ 0 ] [ 2 ] [1 ]/ 
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[ 1 ] [ 2 ] [ 0 ] 
[ 1 ] [ 0 ] [ 0 ] 
[ 2 ] [ 2 ] [ 1 ] 
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Row3^Row 3 — Rowi 


□ 


Exercise 14.11. For each type of elementary row operation, describe the matrix 
X which corresponds to it, as well as X~ ] . 

Exercise 14.12. Given a matrix B e F mxn in reduced row echelon form, show 
how to compute its pivot sequence using 0(n) operations in F. 

Exercise 14.13. In §4.4, we saw how to speed up matrix multiplication over Z 
using the Chinese remainder theorem. In this exercise, you arc to do the same, but 
for performing Gaussian elimination over Z p , where p is a large prime. Suppose 
you arc given an m x m matrix A over Z p , where lent p) = (dim). Straightforward 
application of Gaussian elimination would require 0(m 3 ) operations in Z p , each of 
which takes time 0{m 2 ), leading to a total running time of 0(m 5 ). Show how to 
use the techniques of §4.4 to reduce the running time of Gaussian elimination to 
0(m 4 ). 


14.5 Applications of Gaussian elimination 

Throughout this section, A is an arbitrary m x n matrix over a field F, and XA = B , 
where X is an invertible m x m matrix, and B is an m x n matrix in reduced row 
echelon form with pivot sequence {p \, . . . , p r ). This is precisely the information 
produced by the extended Gaussian elimination algorithm, given A as input (the 
pivot sequence can easily be “read” directly from B — see Exercise 14.12). Also, 
let 

V vA 

be the linear map corresponding to A. 


Computing the image and kernel 

Consider first the row space of A, by which we mean the subspace of F lxn spanned 
by {Row,(A)}™ =1 , or equivalently, the image of A a- 

We claim that the row space of A is the same as the row space of B. To see this, 
note that since B = XA, for every v e F ixm , we have vB = v(XA) = (vX) A, and 
so the row space of B is contained in the row space of A. For the other containment, 
note that since X is invertible, we can write A = X~ l B , and apply the same 
argument. 
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Further, note that the row space of B. and hence that of A, clearly has dimension 
r. Indeed, as stated in Theorem 14.7, rows 1, . . . , r of B form a basis for the row 
space of B. 

Consider next the kernel K of A a, or what we might call the row null space 
of A. We claim that {Row,(A r )}'” =) . + I is a basis for K. Clearly, just from the 
fact that X A = B and the fact that rows r + 1, . . . , m of B are zero, it follows 
that rows r + of X arc contained in K. Furthermore, as X is invertible, 

(Row,(X)}™ j is a basis for F lxm (see Theorem 14.6). Thus, the family of vectors 
{RoWj(J)}"‘ j is linearly independent and spans a subspace K' of K. It suffices 
to show that K' = K. Suppose to the contrary that K' C K, and let v G K \ K’ . 
As {Row,(A)}“ =1 spans F lxm , we may write v = Yj 7=\ c ‘ R°w,(4f); moreover, as 
v ^ K' , we must have c, Op for some i = 1, . . . , r. Setting v := (ci, . . . , c,„), we 
see that v = vX, and so 

Aa(v) = vA = (vX)A = v(XA) = vB. 

Furthermore, since {Row,( ZJ)}' =| is linearly independent, rows r + 1, . . . , m of B 
are zero, and v has a non-zero entry in one of its first r positions, we see that vB is 
not the zero vector. We have derived a contradiction, and hence may conclude that 
K' = K. 

Finally, note that if m = n. then A is invertible if and only if its row space has 
dimension m, which holds if and only if r = m, and in the latter case, B is the 
identity matrix, and hence X is the inverse of A. 

Let us su mm arize the above discussion: 

• The first r rows of B form a basis for the row space of A (i.e., the image of 

At). 

• The last m — r rows of X form a basis for the row null space of A (i.e., the 
kernel of A a). 

• If m = n. then A is invertible (i.e., Aa is an isomorphism) if and only if 
r = m, in which case X is the inverse of A (i.e., the matrix of Af 1 relative 
to the standard basis). 

So we see that from the output of the extended Gaussian elimination algorithm, 
we can simply “read off” bases for both the image and the kernel, as well as the 
inverse (if it exists), of a linear map represented as a matrix with respect to given 
bases. Also note that this procedure provides a “constructive” version of Theo- 
rem 13.28. 

Example 14.6. Continuing with Examples 14.4 and 14.5, we see that the vectors 
([1], [0], [2]) and ([0], [1], [1]) form a basis for the row space of A, while the vector 
([2], [2], [1]) is a basis for the row null space of A. □ 



394 


Matrices 


Solving systems of linear equations 

Suppose that in addition to the matrix A, we arc given w e F 1 x ”, and want to find 
a solution v e F 1 x "' (or perhaps describe all solutions), to the equation 

vA = w. (14.3) 

Equivalently, we can phrase the problem as finding an element (or describing all 
elements) of the set ^'({w}). 

Now, if there exists a solution at all, say v e F lxm , then Aa(v) = Aa(v') if and 
only if v = v' (mod K), where K is the kernel of A a- It follows that the set of all 
solutions to (14.3) is v + K = {v + vo : vo e K}. Thus, given a basis for K and 
any solution v to (14.3), we have a complete and concise description of the set of 
solutions to (14.3). 

As we have discussed above, the last m — r rows of X form a basis for K, so it 
suffices to determine if w e Im A a, and if so, determine a single pre-image v of w. 

Also as we discussed, Im A a, that is, the row space of A, is equal to the row space 
of B , and because of the special form of B, we can quickly and easily determine if 
the given w is in the row space of B, as follows. By definition, w is in the row space 
of B if and only if there exists a vector v e F lxm suc h that vB = w. We may as well 
assume that all but the first r entries of v arc zero. Moreover, vB = w implies that 
for i = 1, . . . , r, the ith entry of v is equal to the p,th entry of w. Thus, the vector v, 
if it exists, is completely determined by the entries of w at positions p\,...,p r . We 
can construct v satisfying these conditions, and then test if vB = w. If not, then we 
may conclude that (14.3) has no solutions; otherwise, setting v := vX, we see that 
vA = (vX)A = v(XA) = vB = w, and so v is a solution to (14.3). 

One easily verifies that if we implement the above procedure as an algorithm, 
the work done in addition to running the extended Gaussian elimination algorithm 
amounts to 0(m(n + m )) operations in F. 

A special case of the above procedure is when m = n and A is invertible, in 
which case (14.3) has a unique solution, namely, v := wX, since in this case, 
X = A~\ 


The rank of a matrix 

We define the row rank of A to be the dimension of its row space, which is equal to 
dinif (Im A a )- The column space of A is defined as the subspace of F" ,xl spanned 
by {Col/(A)}” =1 ; that is, the column space of A is {Az : 2 e F" xl }. The column 
rank of A is the dimension of its column space. 

Now, the column space of A need not be the same as the column space of B, but 
from the identity B = X A, and the fact that X is invertible, it easily follows that 
these two subspaces are isomorphic (via the map that sends y e F mx 1 to Xy), and 



14.5 Applications of Gaussian elimination 


395 


hence have the same dimension. Moreover, by Theorem 14.7, the column rank of 
B is r, which is the same as the row rank of A. 

So we may conclude: The column rank and row rank of A are the same. 
Because of this, we may define the rank of a matrix to be the common value of 
its row and column rank. 


The orthogonal complement of a subspace 

So as to give equal treatment to rows and columns, one can also define the column 
null space of A to be the kernel of the linear map defined by multiplication on the 
left by A\ that is, the column null space of A is e F nx 1 : Az = 0™ x1 }. By 
applying the results above to the transpose of A , we see that the column null space 
of A has dimension n — r, where r is the rank of A. 

Let U C F ]xn be the row space of A , and let U L C F lxn denote the set of 
all vectors u e F lxn whose transpose u T belongs to the column null space of A. 
Now, U is a subspace of F lxn of dimension r and U 1 - is a subspace of F lxn of 
dimension n — r. The space U L consists precisely of all vectors u e F lx " that arc 
“orthogonal” to all vectors u e U . in the sense that the “inner product” tuT is zero. 
For this reason, U 1 is sometimes called the “orthogonal complement of U.” 

Clearly, U 1 is determined by the subspace U itself, and does not depend on the 
particular - choice of matrix A. It is also easy to see that the orthogonal complement 
of U 1 is t7; that is, (U ± ) ± = U. This follows immediately from the fact that 
U C (U 1 ) 1 and dinifftt/ -1 ) -1- ) = n - dinip (tJ -1 ) = dim/-(t7). 

Now suppose that U n U L = {0} . Then by Theorem 13.1 1, we have an isomor- 
phism of U x t7 x with U + U x , and since U x U 1 - has dimension n, it must be the 
case that U + U 1 = F lxn . It follows that every element of F lxn can be expressed 
uniquely as u + u, where u e U and u e U 1 . 

We emphasize that the observations in the previous paragraph hinged on the 
assumption that U n U 1 = {0}, which itself holds provided U contains no non- 
zero “self-orthogonal vectors” u such that uiT is zero. If F is the field of real 
numbers, then of course there are no non-zero self-orthogonal vectors, since uu T 
is the sum of the squares of the entries of u. However, for other fields, there may 
very well be non-zero self-orthogonal vectors. As an example, if F - ZL, then any 
vector u with an even number of 1-entries is self orthogonal. 

So we see that while much of the theory of vector spaces and matrices carries 
over without change from familial - ground fields, like the real numbers, to arbitrary 
ground fields F, not everything does. In particular, the usual decomposition of a 
vector space into a subspace and its orthogonal complement breaks down, as does 
any other procedure that relies on properties specific to “inner product spaces.” 
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For the following three exercises, as above, A is an arbitrary m x n matrix over a 
field F , and X A = B, where X is an invertible m x m matrix, and B is in reduced 
row echelon form. 

Exercise 14.14. Show that the column null space of A is the same as the column 
null space of B. 

Exercise 14.15. Show how to compute a basis for the column null space of A 
using 0{r{n — r)) operations in F, given A and B. 

Exercise 14.16. Show that the matrix B is uniquely determined by A; more 
precisely, show that if X'A = B' , where X' is an invertible m x m matrix, and B' is 
in reduced row echelon form, then B' = B. 

In the following two exercises, the theory of determinants could be used; how- 
ever, they can all be solved directly, without too much difficulty, using just the ideas 
developed so far in the text. 

Exercise 14.17. Let p be a prime. A matrix A e Z mxm is called invertible mod- 
ulo p if there exists a matrix B e Z mxm such that A B = BA = I (mod p), where I 
is the m x m integer identity matrix. Here, two matrices arc considered congruent 
with respect to a given modulus if their corresponding entries arc congruent. Show 
that A is invertible modulo p if and only if A is invertible over O, and the entries 
of A~ l lie in Q (p) (see Example 7.26). 

Exercise 14.18. You arc given a matrix A e Z mxm and a prime p such that A 
is invertible modulo p (see previous exercise). Suppose that you are also given 
we Z lxm . 

(a) Show how to efficiently compute a vector v e Z lx "' such that vA = 
w (mod p), and that v is uniquely determined modulo p. 

(b) Given a vector v as in paid (a), along with an integer e > 1, show how 
to efficiently compute v e Z lxm such that vA = w (mod p e ), and that v 
is uniquely determined modulo //. Hint: mimic the “lifting” procedure 
discussed in §12.5.2. 

(c) Using parts (a) and (b), design and analyze an efficient algorithm that takes 
the matrix A and the prime p as input, together with a bound F[ on the 
absolute value of the numerator and denominator of the entries of the vector 
v' that is the unique (rational) solution to the equation VA = w. Your 
algorithm should run in time polynomial in the length of H , the length of 
p, and the sum of the lengths of the entries of A and w. Hint: use rational 
reconstruction, but be sure to fully justify its application. 
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Note that in the previous exercise, one can use the theory of determinants to 
derive good bounds, in terms of the lengths of the entries of A and w, on the size of 
the least prime p such that A is invertible modulo p (assuming A is invertible over 
the rationals), and on the length of the numerator and denominator of the entries of 
rational solution V to the equation v' A = w. The interested reader who is familiar 
with the basic theory of determinants is encouraged to establish such bounds. 

The next two exercises illustrate how Gaussian elimination can be adapted, in 
certain cases, to work in rings that are not necessarily fields. Let R be an arbitrary 
ring. A matrix B e R mxn is said to be in row echelon form if there exists a pivot 

sequence ( p\ p r ), with 0 < r < m and 1 < p\ < P 2 < ■ ■ ■ < p r < n, such that 

the following holds: 

• for / = all of the entries in row i of B to the left of entry (/, p t ) arc 

zero; 

• for i = 1, . . . , r, we have B(i, p t ) ^ 0r\ 

• all entries in rows r + 1, . . . , m of B are zero. 

Exercise 14.19. Let R be the ring Z p e , where p is prime and e > 1. Let 
n := \p\ e R. The goal of this exercise is to develop an efficient algorithm for the 
following problem: given a matrix A e R mxn , with m > n, find a vector v e R lxm 
such that vA = 0^ x " but v ^ 7rR lxm . 

(a) Show how to modify the extended Gaussian elimination algorithm to solve 
the following problem: given a matrix A e R mxn , compute X e R mxm and 
B e R mx '\ such that X A = B, X is invertible, and B is in row echelon 
form. Your algorithm should run in time ()(mn(m + n)e 2 lent p) 2 ). Assume 
that the input includes the values p and e. Hint: when choosing a pivot ele- 
ment, select one divisible by a minimal power of n\ as in ordinary Gaussian 
elimination, your algorithm should only use elementary row operations to 
transform the input matrix. 

(b) Using the fact that the matrix X computed in part (a) is invertible, argue 
that none of its rows belong to jt R ] xm . 

(c) Argue that if m > n and the matrix B computed in paid (a) has pivot 
sequence (p\, . . . , p r ), then m — r > 0 and if v is any one of the last m — r 
rows of X, then vA = 0^ x ”. 

(d) Give an example that shows that {Row,(£)}' =1 need not be linearly inde- 
pendent, and that {Row,(Y)}™ =j . +1 need not span the kernel of the linear 
map A a corresponding to A. 

Exercise 14.20. Let R be the ring Zf, where £ > 1 is an integer. You are given 
a matrix A e R"' x ". Show how to efficiently compute X e R mxm and B e R mxn 
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such that XA = B, X is invertible, and B is in row echelon form. Your algorithm 
should run in time 0(mn(m + n) len(f) 2 ). Hint: to zero-out entries, you should use 
“rotations” — for integers a , b, d, s, t with 

d = gcd(n, b) ± 0 and as + bt = d, 

and for row indices r, a rotation simultaneously updates rows r and i of a matrix 
C as follows: 

(Row r (C),Row,(C)) <r- (sRow r (C) + fRow,(C), — — Row r (C) + - Row,(C)); 

d d 

observe that if C{r,j) = \a\t and C(i,j ) = [b]( before applying the rotation, then 
C(r,j ) = \d\t and C{i,j) = [0]/ after the rotation. 

Exercise 14.21. Consider again the setting in Exercise 14.3. Show that A e 
pmx(m-C) j s a p ar j t:y check matrix for U if and only if {Col j(A) T }™~f is a basis for 
the orthogonal complement of \ecs(U) C F lxm . 

Exercise 14.22. Let {v,}" =1 be a family of vectors, where v, e M 1 x< for each 
i = 1 We say that { v, } ” = , is pairwise orthogonal if v, vj = 0 for all i ■£ j. 
Show that every pairwise orthogonal family of non-zero vectors over M is linearly 
independent. 

Exercise 14.23. The purpose of this exercise is to use linear algebra to prove that 
any pairwise independent family of hash functions (see §8.7) must contain a large 
number of hash functions. More precisely, let { <3> r } re R be a pairwise independent 
family of hash functions from S to T, with |T| > 2. Our goal is to show that 

\R\ > |S|. Let n := |Y|, and m := |T|, and ( := |R|. Write R = [ri ry } and 

S = {si, . . . , 5,,}. Without loss of generality, we may assume that T is a set of 
non-zero real numbers that sum to zero (e.g., T = { 1 , . . . , m — 1 , —m(m — l)/2}). 
Now define the matrix A e W xt with A(i,j) := <t> rj (sj). Show that {Row,(A)}" =1 
is a pairwise orthogonal family of non-zero vectors (see previous exercise). From 
this, deduce that t > n. 


14.6 Notes 

While a trivial application of the defining formulas yields a simple algorithm for 
multiplying two n x n matrices over a ring R that uses 0(n 3 ) operations in R, this 
algorithm is not the best, asymptotically speaking. The currently fastest algorithm 
for this problem, due to Coppersmith and Winograd [28], uses 0{n m ) operations in 
R, where <» < 2.376. We note, however, that the good old 0(n 3 ) algorithm is still 
the only one used in almost any practical setting. 
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This chapter presents subexponential-time algorithms for computing discrete log- 
arithms and for factoring integers. These algorithms share a common technique, 
which makes essential use of the notion of a smooth number. 


15.1 Smooth numbers 

If y is a non-negative real number and m is a positive integer, then we say that m is 
y-smooth if all prime divisors of m arc at most y. 

For 0 < y < x, let us define Tfy, x) to be the number of y-smooth integers up to 
x. The following theorem gives us a lower bound on v P(y, x), which will be crucial 
in the analysis of our discrete logarithm and factoring algorithms. 

Theorem 15.1. Let y be a function of x such that 

y log x 

> oo and u := * oo 

log x log y 

as x -» oo. Then 


v F(y, x) > x • exp[(-l + o(l))wloglogx]. 

Proof. Let us write u = [wj + <5, where 0 < 8 < 1. Let us split the primes up to y 
into two sets: the set V of “very small” primes that are at most y s / 2, and the set 
W of other primes that are greater than y 5 / 2 but at most y. To simplify matters, 
let us also include the integer 1 in the set V. 

By Bertrand’s postulate (Theorem 5.8), there exists a constant C > 0 such that 
\W\ > Cy/ log y for sufficiently large y. By the assumption that y / log x -» oo as 
x -> oo, we also have \ W\ > 2 [uj for sufficiently large x. 

To derive the lower bound, we shall count those integers that can be built up by 
multiplying together distinct elements of W, together with one element of V. 
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These products arc clearly distinct, y-smooth numbers, and each is bounded by x, 
since each is at most yL"Jy 6 = y u = x. 

If S denotes the set of all of these products, then for x sufficiently large, we have 

\w\' 

\u\ 

\w\(\w\-i)---(\w\- L«J + 1) 


\s\ = 


•FI 


H! 


FI 




>(- 9 -) 
V 2 u log y / 

= ( Cy V' 

~ \21oex/ 


L»J 


•FI 


FI- 


(15.1) 


v 2 log x , 

Taking logarithms, we have 

logFI > ( u - <5)(log y - log log x + log(C/2)) + logFI 
= logx - it log log x + (logFI - <5 logy) + 

0{u + log log x). 

To prove the theorem, it suffices to show that 

log 1 5 1 > logx - (1 + o(l))w log logx. 

Under our assumption that u -> oo, the term 0(u + log logx) in (15.1) is clearly 
o(u log log x), and so it will suffice to show that the term (log|U| - 5 log y) is also 
o(u log log x). But by Chebyshev’s theorem (Theorem 5.1), for some positive con- 
stant D, we have 

Dy s /logy<\V\<y 5 , 

and taking logarithms, and again using the fact that it -> oo, we have 
log|F| - S log y = 0(log log y) = o(u log log x). □ 


15.2 An algorithm for discrete logarithms 

We now present a probabilistic, subexponential-time algorithm for computing dis- 
crete logarithms. The input to the algorithm is p, q, y, a, where p and q are primes, 
with q | (p — 1), y is an element of Z* generating a subgroup G of Z* of order q, 
and a e G. 

We shall make the simplifying assumption that q 1 \ (p — 1), which is equivalent 
to saying that q \ m := ( p — 1 )/q. Although not strictly necessary, this assumption 
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simplifies the design and analysis of the algorithm, and moreover, for cryptographic 
applications, this assumption is almost always satisfied. Exercises 15.1-15.3 below 
explore how this assumption may be lifted, as well as other generalizations. 

At a high level, the main goal of our discrete logarithm algorithm is to find a ran- 
dom representation of 1 with respect to y and a — as discussed in Exercise 11.12, 
this allows us to compute log / a (with high probability). More precisely, our main 
goal is to compute integers r and s in a probabilistic fashion, such that y r a s = 1 
and \s\ q is uniformly distributed over Z 9 . Having accomplished this, then with 
probability 1 - \/q, we shall have s ^ 0 (mod q), which allows us to compute 
log 7 a as -rs~ l mod q. 

Let H be the subgroup of Z* of order m. Our assumption that q \ m implies 
that GO H = { 1 } , since the multiplicative order of any element in the intersection 
must divide both q and m, and so the only possibility is that the multiplicative 
order is 1. Therefore, the map p : G x H ^ Z* that sends (/l, 5 ) to fid is injective 
(Theorem 6.25), and since |Z*| = qm, it must be surjective as well. 

We shall use this fact in the following way: if fi is chosen uniformly at random 
from G, and <5 is chosen uniformly at random from H (and independent of fi), then 
fid is uniformly distributed over Z*. Furthermore, since II is the image of the q- 
power map on Z*, we may generate a random Sell simply by choosing 6 e Z* 
at random, and setting 5 := S q . 

The discrete logarithm algorithm uses a “smoothness parameter” y. We will 
discuss choice of y below, when we analyze the running time of the algorithm; for 
now, we only assume that y < p. Let p\, .... pk be an enumeration of the primes 
up to y. Let jr, := [p,] p € Z* for i = 1, k. 

The algorithm has two stages. 

In the first stage, we find relations of the form 

f'a s '8i = x* (15.2) 

for i = 1 , . . . ,k + 1, where r h s,, ea, . . . , e,-* e Z and <5, e H for each i. 

We obtain each such relation by a randomized search, as follows: we choose 
r,-, Si e {0, .... q - 1 } at random, as well as <5, e Z* at random; we then compute 
5j := Sq, fij := y r ‘a Si , and m, := rep(/?,<5,). Now, the value If is uniformly dis- 
tributed over G, while <5, is uniformly distributed over II ; therefore, the product 
fid, is uniformly distributed over Z*, and hence m, is uniformly distributed over 
{ 1 , ,p — 1}. Next, we simply tty to factor m, by trial division, trying all the 
primes p\,...,pk up to y. If we arc lucky, we completely factor m, in this way, 
obtaining a factorization 
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for some exponents e,i,. . . , e,-/ ( , and we get the relation (15.2). If we are unlucky, 
then we simply keep trying until we arc lucky. 

For i = 1, . . . , k + 1, let v,- := (e,i, . . . , e^) e Z xfc , and let v, denote the image of 
v, in 7L x q k (i.e., v,- := ([en ] q , . . . , [eik\ q ))- Since Z q k is a vector space over the field 
7L q of dimension k, the family of vectors vi, . . . , vg+i must be linearly dependent. 
The second stage of the algorithm uses Gaussian elimination over 7L q (see §14.4) 
to find a linear dependence among the vectors vi, . . . , va+i, that is, to find integers 
ci , . . . , Ck+\ e {0, q - 1 } , not all zero, such that 

(ei,...,e k ) := cm + • • • + CA+m+i e qZ xk . 

Raising each equation (15.2) to the corresponding power c,, and multiplying 
them all together, we obtain 

r s <- ?k 

y a 5 = 7t l ■ ■ ■ jt k , 

where 

k -\- 1 k -\- 1 k -\- 1 

r := ^ c i r i, 5 : = X CiSh and 8 = II d ' - 
/= l i=i /=i 

Now, S e H, and since each e 7 is a multiple of q, we also have n- e H 
for j = 1 It follows that y r a s 6 H. But since y r a s e G as well, and 

G f! H = {1}, it follows that y r a s = 1. If we are lucky (and we will be with 
overwhelming probability, as we discuss below), we will have .v ^ 0 (mod q), in 
which case, we can compute s' := s -1 mod q, obtaining 

a = y- rs ', 

and hence —rs' mod q is the discrete logarithm of a to the base y. If we arc very 
unlucky, we will have .v = 0 (mod q), at which point the algorithm simply quits, 
reporting “failure.” 

The entire algorithm, called Algorithm SEDL, is presented in Fig. 15.1. 

As already argued above, if Algorithm SEDL does not output “failure,” then 
its output is indeed the discrete logarithm of a to the base y. There remain three 
questions to answer: 

1. What is the expected running time of Algorithm SEDL? 

2. How should the smoothness parameter y be chosen so as to minimize the 
expected running time? 

3. What is the probability that Algorithm SEDL outputs “failure”? 

Let us address these questions in turn. As for the expected running time, let 
a be the probability that a random element of {I — 1 } is y-smooth. Then 
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i <- 0 
repeat 

i <— i + 1 
repeat 

choose r h Sj g {0 q — 1 } at random 

choose <5, g Z* at random 
ft <- y r ‘a Si , 5i <r- 5., +- rep (M) 

test if rrij is y-smooth (trial division) 
until mj = pj' 1 • • • p£ for some integers en, ... , ej k 
until i = k + 1 

set V, <— (e,i, . . . , e,/t) g Z* fe for i = 1, k + 1 

apply Gaussian elimination over 7L q to find integers ci, . . . , Ck + 1 g 
{0, . . . , q — 1 } , not all zero, such that 
Civi + • • • + c k+l v k+ 1 g qZ xk . 

Z k + 1 

/=1 CiTu $ 2ji = 1 C i S i 

if s = 0 (mod q) 

then output “failure” 
else output -/-.G 1 mod q 


Fig. 15.1. Algorithm SEDL 


the expected number of attempts needed to produce a single relation is , and 
so the expected number of attempts to produce k + 1 relations is (k + l)c _1 . 
In each attempt, we perform trial division using p\, . . . , p k , along with a few 
other minor computations, leading to a total expected running time in stage 1 of 
k 2 (j~ l ■ lcn( p)° ( 1 h The running time in stage 2 is dominated by the Gaussian 
elimination step, which takes time k 3 • lcn(p) on h Thus, if Z is the total running 
time of the algorithm, then we have 

E[Z] < (kV 1 + k 3 ) • len(p) om . (15.3) 

Let us assume for the moment that 

y = exp[(logp) /l+o(1) ] (15.4) 

for some constant X with 0 < X < 1 . Our final choice of y will indeed satisfy this 
assumption. Consider the probability <r. We have 


a = Yfy, p - 1 )/(p - 1) = Yfy, p)/(p - 1) > Y(; y, p)/p. 
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where for the second equality we use the assumption that y < p, so p is not 37 - 
smooth. With our assumption (15.4), we may apply Theorem 15.1 (with the given 
value of y and x := p), obtaining 

o- > exp[(-l + o(l))(log pj log y) log log p]. 

By Chebyshev’s theorem (Theorem 5.1), we know that k = (-)(>’/ log y), and so 
log/c = (1 + o(l)) log y. Moreover, assumption (15.4) implies that the factor 
len(p)° (1) in (15.3) is of the form exp[o(min(log y, log p/ log y))], and so we have 

E[Z] < exp[(l + o(l)) max{(logp/ log y) log log p + 2 log y, 3 logy}]. (15.5) 

Let us find the value of y that minimizes the right-hand side of (15.5), ignoring 
the “o(l)” terms. Let p := logy, A := logploglogp. Si := A/p + 2 p, and 
S 2 := 3 p. We want to find p that minimizes max { .S) , S 2 j . Using a little calculus, 
one sees that Si is minimized at p = (A/2) 1 / 2 . With this choice of p, we have 
Si = (2x/2)A 1 / 2 and S 2 = (3/V2)A 1 / 2 < Si. Thus, choosing 

y = exp[(l/V2)(logploglogp) 1/2 ], 

we obtain 

E[Z] < exp[(2V2 + 0 (l))(logploglogp) 1 / 2 ]. 

That takes care of the first two questions, although strictly speaking, we have 
only obtained an upper bound for the expected running time, and we have not 
shown that the choice of y is actually optimal, but we shall nevertheless content 
ourselves (for now) with these results. Finally, we deal with the third question, on 
the probability that the algorithm outputs “failure.” 

Lemma 15.2. The probability that Algorithm SEDL outputs “failure” is 1/q. 

Proof. Let F be the event that the algorithm outputs “failure.” For i = 1, . . . , k+ 1, 
we may view the final values assigned to r h s it 6 h and m, as random variables, 
which we shall denote by these same names (to avoid additional notation). Simi- 
larly, we may view s as a random variable. 

Let m' v .... m' k+l be arbitrary, fixed y-smooth numbers, and let B be the event 

that m\ = mg+i = m' k+{ . We shall show that P[F | B\ = 1 / q, and since this 

holds for all relevant B , it follows by total probability that P[F] = 1/q. 

For the rest of the argument, we focus on the conditional distribution given 
B. With respect to this conditional distribution, the distribution of each random 
variable (r h Sj, <5,) is (essentially) the uniform distribution on the set 

Pi := {(r', s', <5') e I q x I q x H : / a? 8 = [m' t \ p }, 

where I q := {0 ,q— I } ; also, the family of random variables {(r,, s,, 5,)}^ 



15.2 An algorithm for discrete logarithms 


405 


is mutually independent. It is easy to see that for / = 1, . . . , k + 1, and for each 
s' 6 I q . there exist unique values r' e I q and S' e H such that (r\ s'. S') 6 P,. From 
this, it easily follows that each .v, is uniformly distributed over I q , and the family 
of random variables {.q}^ 1 is mutually independent. Also, the values c \, . . ., cg + \ 

computed by the algorithm me, fixed (as they are determined by m' v m'. +| ), and 

since s = ci«i + • • ■+Ci c +isi c +i, and not all the c,’s arc zero modulo q, it follows that 
s mod q is uniformly distributed over I q , and so is equal to zero with probability 
\/q. □ 

Let us su mm arize the above discussion in the following theorem. 

Theorem 15.3. With the smoothness parameter set as 

y := exp [( 1 / V2) (log p log log p) 1 /2 ] , 
the expected running time of Algorithm SEDL is at most 

exp [(2 ■ \fl + o( 1 )) (log p log log p) 1 /2 ] . 

The probability that Algorithm SEDL outputs “failure” is 1 / q. 

In the description and analysis of Algorithm SEDL, we have assumed that the 
primes p\,...,pk were pre-computed. Of course, we can construct this list of 
primes using, for example, the sieve of Eratosthenes (see §5.4), and the running 
time of this pre-computation will be dominated by the running time of Algo- 
rithm SEDL. 

In the analysis of Algorithm SEDL, we relied crucially on the fact that in gener- 
ating a relation, each candidate element y ri oc Si 5i was uniformly distributed over Z*. 
If we simply left out the Sf s, then the candidate element would be uniformly dis- 
tributed over the subgroup G, and Theorem 15.1 simply would not apply. Although 
the algorithm might anyway work as expected, we would not be able to prove this. 


Exercise 15.1. Using the result of Exercise 14.19, show how to modify Algo- 
rithm SEDL to work in the case where p — 1 = q e m, e > 1, q \ m, y generates 
the subgroup G of Z* of order q e , and a e G. Your algorithm should compute 
a with roughly the same expected running time and success probability as 
Algorithm SEDL. 

Exercise 15.2. Using the algorithm of the previous exercise as a subroutine, 
design and analyze an algorithm for the following problem. The input is p, q, y, a, 
where p is a prime, q is a prime dividing p - 1, y generates the subgroup G of Z* 
of order q, and a e G\ note that we may have q 2 \ (p — 1). The output is log r a. 
Your algorithm should always succeed in computing this discrete logarithm, and its 
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expected running time should be bounded by a constant times the expected running 
time of the algorithm of the previous exercise. 

Exercise 15.3. Using the result of Exercise 14.20, show how to modify Algo- 
rithm SEDL to solve the following problem: given a prime p, a generator y for 
Z*, and an element a e Z*, compute log 7 a. Your algorithm should work without 
knowledge of the factorization of/)— I ; its expected running time should be roughly 
the same as that of Algorithm SEDL, but its success probability may be lower. In 
addition, explain how the success probability may be significantly increased at 
almost no cost by collecting a few extra relations. 

Exercise 15.4. Let n = pq, where p and q are distinct, large primes. Let e be a 
prime, with e < n and e \ (p — 1 ){q — 1). Let x be a positive integer, with x < n. 
Suppose you are given n (but not its factorization!) along with e and x. In addition, 
you arc given access to two “oracles,” which you may invoke as often as you like. 

• The first oracle is a “challenge oracle”: each invocation of the oracle pro- 
duces a “challenge” a e { 1, . . . , x} — distributed uniformly, and independ- 
ent of all other challenges. 

• The second oracle is a “solution oracle”: you invoke this oracle with the 
index of a previous challenge oracle; if the corresponding challenge was a, 
the solution oracle returns the eth root of a modulo n: that is, the solution 
oracle returns be {1, 1 } such that b e = a (mod n) — note that b 
always exists and is uniquely determined. 

Let us say that you “win” if you arc able to compute the eth root modulo n of any 
challenge, but without invoking the solution oracle with the corresponding index 
of the challenge (otherwise, winning would be trivial, of course). 

(a) Design a probabilistic algorithm that wins the above game, using an 
expected number of 

exp[(c + o(l)Xlogxloglogx) 1 / 2 ] • len(n)° (1) 

steps, for some constant c, where a “step” is either a computation step or an 
oracle invocation (either challenge or solution). Hint: Gaussian elimination 
over the field Z e . 

(b) Suppose invocations of the challenge oracle are “cheap,” while invocations 
of the solution oracle arc relatively “expensive.” How would you modify 
your strategy in part (a)? 

Exercise 15.4 has implications in cryptography. A popular way of implementing 
a public-key primitive known as a “digital signature” works as follows: to digi- 
tally sign a message M (which may be an arbitrarily long bit string), first apply 



15.3 An algorithm for factoring integers 


407 


a “hash function” or “message digest” H to M, obtaining an integer a in some 
fixed range {1, . . . , x}, and then compute the signature of M as the eth root b of 
a modulo n. Anyone can verify that such a signature b is correct by checking that 
b e = H(M ) (mod n): however, it would appeal - to be difficult to “forge” a signature 
without knowing the factorization of n. Indeed, one can prove the security of this 
signature scheme by assuming that it is hard to compute the eth root of a random 
number modulo n, and by making the heuristic assumption that H is a random 
function (see §15.5). However, for this proof to work, the value of x must be close 
to «; otherwise, if x is significantly smaller than n, as the result of this exercise, 
one can break the signature scheme at a cost that is roughly the same as the cost of 
factoring numbers around the size of x, rather than the size of n. 


15.3 An algorithm for factoring integers 

We now present a probabilistic, subexponential-time algorithm for factoring inte- 
gers. The algorithm uses techniques very si mi lar to those used in Algorithm SEDL 
in §15.2. 

Let n > 1 be the integer we want to factor. We make a few simplifying assump- 
tions. First, we assume that n is odd — this is not a real restriction, since we can 
always pull out any factors of 2 in a pre-processing step. Second, we assume that 
n is not a perfect power, that is, not of the form a b for integers a > 1 and b > I — 
this is also not a real restriction, since we can always partially factor n using the 
algorithm from Exercise 3.31 if n is a perfect power. Third, we assume that n is 
not prime — this may be efficiently checked using, say, the Miller-Rabin test (see 
§ 10.2). Fourth, we assume that n is not divisible by any primes up to a “smoothness 
parameter” y — we can ensure this using trial division, and it will be clear that the 
running time of this pre-computation is dominated by that of the algorithm itself. 
With these assumptions, the prime factorization of n is of the form 

fl fw 

n= q x ■ ■ ■ q w , 

where w > 1, the qf s are distinct, odd primes, each greater than y, and the /,■’ s are 
positive integers. 

The main goal of our factoring algorithm is to find a random square root of 1 in 
Z*. Let 

9 : Z„ — > Z t\ x • • • x Z fw 

Qw 

[a]„ (fa\ q h,...,[a\ q h) 

be the ring isomorphism of the Chinese remainder theorem. The square roots of 

1 in Z* are precisely those elements ye Z* such that 9(y) = (±1, ±1). If 

y is a random square root of 1, then with probability 1 - 2“" :+l > 1/2, we have 
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0(y) = (/], . . . , y w ), where the y{ s are neither all 1 nor all -1 (i.e., y f ±1). If this 
happens, then 9{y — \) = (y\ — . ,y w — 1), and so we see that some, but not all, 

of the values y t - 1 will be zero. The value of gcd(rep(y - 1), n) is precisely the 

f. 

product of the prime powers q t ' such that y, — 1 = 0, and hence this gcd will yield 
a non-trivial factorization of n, unless y = ± 1 . 


Let pi, . . . , p k be the primes up to the smoothness parameter y mentioned above. 
Let iti := \pi\ n e Z* for i = 1, k. 

We first describe a simplified version of the algorithm, after which we modify 
the algorithm slightly to deal with a technical problem. Like Algorithm SEDL, this 
algorithm proceeds in two stages. In the first stage, we find relations of the form 

aj = 7t e , n ■■■tz e ‘ k , (15.6) 


for i = 1 ,k + 1, where e, \ , . . . , e ifc e Z and a, e Z* for each i. 

We can obtain each such relation by randomized search, as follows: we select 
a, e Z* at random, square it, and try to factor m, := rep (aj) by trial division, trying 
all the primes pi,. . . ,p k up to y. If we are lucky, we obtain a factorization 


for some exponents e,i, . . . , e ik , yielding the relation (15.6); if not, we just keep 
trying. 

For i = 1, . . . , k + 1, let v, := (e,-i, . . . , e ik ) e Z xfc , and let v, denote the image 
of V, in Z£ fe (i.e., v,- := ([e,i] 2 , . . . , [e,-^)). Since Z£ fe is a vector space over 
the field Zo of dimension k, the family of vectors vj, . . . , v/ (+ | must be linearly 
dependent. The second stage of the algorithm uses Gaussian elimination over Z 2 
to find a linear dependence among the vectors iq, . . . , W+t, that is, to find integers 
cj , . . . , Ck + 1 e {0, 1 } , not all zero, such that 

Ot, ■ • • , e k ) := civi + • • • + c k+i v k+ i e 2Z xk . 


Raising each equation (15.6) to the corresponding power c,-, and multiplying them 
all together, we obtain 


„ 2 _ e\ e k 

Ot — JZ | • • • K j. , 


where 


k + 1 

a := ay. 
(=1 


l 


Since each e, is even, we can compute 


P ■= n x 


ei/2 e k /2 


n, r 


and we see that a 2 = f 2 , and hence (a/ f) 1 = 1. Thus, y := a/ f is a square root 


2 _ 
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of 1 in Z*. A more careful analysis (see below) shows that in fact, y is uniformly 
distributed over all square roots of 1, and hence, with probability at least 1 /2, if we 
compute gcd(rep(y - 1), n), we get a non-trivial factor of n. 

That is the basic idea of the algorithm. There is, however, a technical problem. 
Namely, in the method outlined above for generating a relation, we attempt to fac- 
tor m, := rep(a“). Thus, the running time of the algorithm will depend in a crucial 
way on the probability that a random square modulo n is y-smooth. Unfortunately 
for us, Theorem 15.1 does not say anything about this situation — it only applies 
to the situation where a number is chosen at random from an interval [1 , x | . There 
arc (at least) three different ways to address this problem: 

1. Ignore it, and just assume that the bounds in Theorem 15.1 apply to random 
squares modulo n (taking x := n in the theorem). 

2. Prove a version of Theorem 15.1 that applies to random squares modulo n. 

3. Modify the factoring algorithm, so that Theorem 15.1 applies. 

The first choice, while not unreasonable from a practical point of view, is not very 
satisfying mathematically. It turns out that the second choice is indeed a viable 
option (i.e., the theorem is true and is not so difficult to prove), but we opt for the 
third choice, as it is somewhat easier to carry out, and illustrates a probabilistic 
technique that is more generally useful. 

So here is how we modify the basic algorithm. Instead of generating relations of 
the form (15.6), we generate relations of the form 

a]d = n (15.7) 

for / = 1, . . . , k + 2, where e,i, . . . , e Z and a, e Z* for each i, and 5 e Z*. 
Note that the value <5 is the same in all relations. 

We generate these relations as follows. For the very first relation (i.e., i = 1), 
we repeatedly choose a\ and 5 in Z* at random, until rep(a 2 <5) is y-smooth. Then, 
after having found the first relation, we find each subsequent relation (i.e., for 
i > 1) by repeatedly choosing or, in Z* at random until rep(a 2 <5) is y-smooth, 
where <5 is the same value that was used in the first relation. Now, Theorem 15.1 
will apply directly to determine the success probability of each attempt to generate 
the first relation. When we have found this relation, the value a ] 8 will be uniformly 
distributed over all y-smooth elements of Z* (i.e., elements whose integer repre- 
sentations are y-smooth). Consider the various cosets of (Z*) 2 in Z*. Intuitively, 
it is much more likely that a random y-smooth element of Z* lies in a coset that 
contains many y-smooth elements than in a coset with very few, and indeed, it is 
reasonably likely that the fraction of y-smooth elements in the coset containing <5 
is not much less than the overall fraction of y-smooth elements in Z*. Therefore, 
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for i > 1, each attempt to find a relation should succeed with reasonably high 
probability. This intuitive argument will be made rigorous in the analysis to follow. 
The second stage is then modified as follows. For i = 1, . . . , k + 2, let v, := 

(e,i ,ejk,l) £ Z x( * +1) , and let v ,• denote the image of v,- in Z* (/c+1) . Since 

Z 9 is a vector space over the field Z 2 of dimension k + 1 , the family of vectors 

vi,..., v / (+ 2 must be linearly dependent. Therefore, we use Gaussian elimination 
over Z 2 to find a linear dependence among the vectors vi, . . . , Vk+2, that is, to find 
integers ci, . . . , Ck+ 2 £ { 0, 1 } , not all zero, such that 

(ei, . . . , e/t+i) := c\v\ H h Ck+2Vk+2 e 2Z X(A:+1) . 

Raising each equation (15.7) to the corresponding power c,, and multiplying them 
all together, we obtain 

a 2 5 ek+1 = jt\ 1 ■ ■ ■ K k k , 

where 

k +2 

a := ]”[ a/. 

(=1 

Since each e, is even, we can compute 

p:= 4 l/2 ■ ■ ■ / k k/2 S~ ek+l/2 , 

so that a 2 = f 2 and y := a/ f is a square root of 1 in Z*. 

The entire algorithm, called Algorithm SEF, is presented in Fig. 15.2. 

Now the analysis. From the discussion above, it is clear that Algorithm SEF 
either outputs “failure,” or outputs a non-trivial factor of n. So we have the same 
three questions to answer as we did in the analysis of Algorithm SEDL: 

1. What is the expected running time of Algorithm SEF? 

2. How should the smoothness parameter y be chosen so as to minimize the 
expected running time? 

3. What is the probability that Algorithm SEF outputs “failure”? 

To answer the first question, let a denote the probability that (the canonical 
representative of) a random element of Z* is y-smooth. For i = 1, . . . , k + 2, let 
Li denote the number of iterations of the inner loop in the z'th iteration of the main 
loop in stage 1; that is, L, is the number of attempts made in finding the z'th relation. 

Lemma 15.4. For i = 1, . . . , k + 2, we have E [L,] < er -1 . 

Proof. We first compute E [Lj]. As 5 is chosen uniformly from Z* and independ- 
ent of an, at each attempt to find a relation, a^<5 is uniformly distributed over Z*, 
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i *- 0 
repeat 

i <— i + 1 
repeat 

choose a, e Z* at random 

if i = 1 then choose <5 e Z* at random 

ntj <- rep (a 2 5) 

test if mi is y-smooth (trial division) 
until mi = /q' 1 ■ • • p k k for some integers e,i, . . . , 
until i = k + 2 

set V, <— (e,i, . . . , 1) e Z x(fc+1) for / = 1, k + 2 


apply Gaussian elimination over Z2 to find integers c \ , . . . , c/ (+ 2 e 
{0, 1 } , not all zero, such that 
(ei, . . . , e*+t) := ci vi + • • • + c fc+ 2 v fc+ 2 e 2 Z x(fc+1) . 


n; 


A^+2 Cj 


= 1 




ei/2 




■ 7T e k k/2 8 - ek+l / 2 , 


a//? 


if y = ± 1 

then output “failure” 

else output gcd(rep(y - 1), n) 


Fig. 15.2. Algorithm SEF 


and hence the probability that the attempt succeeds is precisely a. This means 
E[Li] = a" 1 . 

We next compute E[L,] for i > 1. To this end, let us denote the cosets of (Z*) 2 
by Z* as Ci,...,Q. As it happens, t = 2 W , but this fact plays no role in the 
analysis. For j = 1, . . . , t, let oy denote the probability that a random element of 
Cj is y-smooth, and let denote the probability that the final value of 5 belongs to 

Cj- 

We claim that for j = 1, . . . , t, we have tj = ojg l t 1 . To see this, note that each 
coset Cj has the same number of elements, namely, |Z*| t~ l , and so the number of 
y-smooth elements in Cj is equal to cr y |Z* |r — 1 . Moreover, the final value of a 2 <5 
is equally likely to be any one of the y-smooth numbers in Z*, of which there are 
c|Z*|, and hence 


Ti = 


\KV 

o\K\ 


= (JjO l t l . 


which proves the claim. 
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Now, for a fixed value of 5 and a random choice of a, e Z*, one sees that a 2 < 5 
is uniformly distributed over the coset containing 5. Therefore, for j = 1 ,t, if 
Tj > 0, we have 

E \L t | 5 6 Cj\ = a~ ] . 

Summing over all j = 1, . . . , t with tj > 0, it follows that 


E [L,] = 2 E [L, | S e Cj ] • P[<5 e Cf\ 

Tj> 0 

V -i V -i -N-i ^ -i 

= > (7 • • Tj = > (J j ■ (Tj(T t < (7 , 


which proves the lemma. □ 


So in stage 1 , the expected number of attempts made in generating a single rela- 
tion is 1 , each such attempt takes time k-len(n)° (1 \ and we have to generate k+2 
relations, leading to a total expected running time in stage 1 of tj~ l k 2 ■ len(«)° (1 \ 
Stage 2 is dominated by the cost of performing Gaussian elimination, which takes 
time k 3 • len(«) 0<T) . Thus, if Z is the total running time of the algorithm, we have 

E[Z] < (tj~ l k 2 + k 3 ) ■ len(«)° (1) . 


By our assumption that n is not divisible by any primes up to y, all y-smooth 
integers up to n — I arc in fact relatively prime to n. Therefore, the number of 
y-smooth elements of Z* is equal to 'F(y, n — 1), and since n itself is not y-smooth, 
this is equal to x F(y, n). From this, it follows that 

(T = 'V(y,n)/\Z*\>'¥(y,n)/n. 

The rest of the running time analysis is essentially the same as in the analysis 
of Algorithm SEDL; that is, assuming y = cxp[(log«) ; ' +0(l, | for some constant 
0 < A < 1 , we obtain 

E[Z] < exp[(l + o(l)) max { (log »/ logy) log log n + 2 logy, 3 logy}]. (15.8) 
Setting y = exp[(l/V2)(lognloglogn) 1,/2 ], we obtain 

E[Z] < exp[(2x/2 + o(l))(lognloglogn) 1 ^ 2 ]. 

That basically takes care of the first two questions. As for the third, we have: 
Lemma 15.5. Algorithm SEF outputs “failure” with probability 2 - "’ +l < 1/2. 


Proof. Let T be the event that the algorithm outputs “failure.” We may view 
the final values assigned to 5 and aq,..., a ^+2 as random variables, which we 
shall denote by these same names. Let 5’ e Z* and a\, a’ k+2 e (Z*) 2 be 
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arbitrary, fixed values such that rep(a'<5') is y-smooth for i = 1 k + 2. Let 

B be the event that 5 = 5' and a~ = a' for i = 1, . . . , k + 2. We shall show 
that P[F | B \ = 2~ w+1 , and since this holds for all relevant B, it follows by total 
probability that P[F] = 2“"' +l . 

For the rest of the argument, we focus on the conditional distribution given 
B. With respect to this conditional distribution, the distribution of each random 
variable a, is (essentially) the uniform distribution on /F 1 ({«'}) , where p is the 
squaring map on Z* . Moreover, the family of random variables { a, } k ^ is mutually 
independent. Also, the values ft and c\, , c ^+2 computed by the algorithm arc 
fixed. It follows (see Exercise 8.14) that the distribution of a is (essentially) the 
uniform distribution on p~ l {{ft 2 }), and hence y := af ft is a random square root of 
1 in Z*. Thus, y = ±1 with probability 2 _w+1 . □ 

Let us summarize the above discussion in the following theorem. 

Theorem 15.6. With the smoothness parameter set as 

y := exp[(l/V2)(log n log log n) 1 / 2 ], 
the expected running time of Algorithm SEF is at most 

exp[(2V2 + o(l))(log n log log n) 1 ^ 2 ]. 

The probability that Algorithm SEF outputs “failure” is at most 1/2. 

Exercise 15.5. It is perhaps a bit depressing that after all that work. Algo- 
rithm SEF only succeeds (in the worst case) with probability 1/2. Of course, to 
reduce the failure probability, we can simply repeat the entire computation — with 
£ repetitions, the failure probability drops to 2~ f: . However, there is a better way to 
reduce the failure probability. Suppose that in stage 1, instead of collecting k + 2 
relations, we collect k + 1 + £ relations, where 1 > I is an integer parameter. 

(a) Show that in stage 2, we can use Gaussian elimination over Z 2 to find inte- 
ger vectors 

C U) = (cf, . . . , cf +M ) 6 {0, 1 } x{k+l+e) (j = !,...,£) 

such that 

- over the field Z 2 , the images of the vectors c (1) , . . . , c (f:) in z £ (k+l+f) 
form a linearly independent family of vectors, and 

- for / = 1, . . . , £, we have 

_0) . . 0) c r.r-nX(k+ 2) 

C x Vl + r C k+l+( Vk+l+l e 
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(b) Show that given vectors c (1) , . . . , c (l) as in part (a), if for j = 1, . . . ,1, we 
set 


( e t ’ • • • ’ e k+\ > ^ c i v i + c k+ i +e v k+i+t, 


k+\+t 


V U) 


ft “r 1 Tt (jj 

n 4' . r 


<j) 


efn 


(j) 


Tt, 


■ Tt, 


n- , 


4+i/ 2 , 


ot u) /p u) . 


i= 1 


then the family of random variables y (l) , . . . , y ,,) is mutually independent, 
with each y (j) uniformly distributed over the set of all square roots of 1 in 
Z*, and hence at least one of gcd(rep(y ( - /) - 1), n) splits n with probability 
at least 1 - 2~ f . 


So, for example, if we set i = 20, then the failure probability is reduced to less 
than one in a million, while the increase in running time over Algorithm SEF will 
hardly be noticeable. 


15.4 Practical improvements 

Our presentation and analysis of algorithms for discrete logarithms and factoring 
were geared towards simplicity and mathematical rigor. However, if one really 
wants to compute discrete logarithms or factor numbers, then a number of impor- 
tant practical improvements should be considered. In this section, we briefly sketch 
some of these improvements, focusing our attention on algorithms for factoring 
numbers (although some of the techniques apply to discrete logarithms as well). 


15.4.1 Better smoothness density estimates 

From an algorithmic point of view, the simplest way to improve the running times 
of both Algorithms SEDF and SEF is to use a more accurate smoothness density 
estimate, which dictates a different choice of the smoothness bound y in those 
algorithms, speeding them up significantly. While our Theorem 15.1 is a valid 
lower bound on the density of smooth numbers, it is not “tight,” in the sense that 
the actual density of smooth numbers is somewhat higher. We quote from the 
literature the following result: 

Theorem 15.7. Let y be a function of x such that for some e > 0, we have 

y = £2((log x) 1+£ ) and u := — oo 

logy 


as x -» oo. Then 


'Ffy, x) = x ■ exp[(-l + o(l))nlogw]. 
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Proof. See §15.5. □ 

Let us apply this result to the analysis of Algorithm SEF. Assume that 

y = exp[(logn) 1 / 2+o(1) ]. 

Our choice of y will in fact be of this form. With this assumption, we have 
log logy = (1/2 + o( I )) log log n, and using Theorem 15.7, we can improve the 
inequality (15.8), obtaining instead (as the reader may verify) 

E[Z] < exp[(l + o(l)) max{^(logn/logy) log logo + 2 logy, 3 logy}]. 

From this, if we set 

y := exp [ \ (log n log log «) 1 /2 )] , 

we obtain 

E[Z] < exp[(2 + o(l))(logn log logo) 1 / 2 ]. 

An analogous improvement can be obtained for Algorithm SEDF. 

Although this improvement only reduces the constant 2V2 x 2.828 to 2, the 
constant is in the exponent, and so this improvement is not to be scoffed at! 


15.4.2 The quadratic sieve algorithm 

We now describe a practical improvement to Algorithm SEF. This algorithm, 
known as the quadratic sieve, is faster in practice than Algorithm SEF; however, 
its analysis is somewhat heuristic. 

First, let us return to the simplified version of Algorithm SEF, where we collect 
relations of the form (15.6). Furthermore, instead of choosing the values a, at 
random, we will choose them in a special way, as we now describe. Let 

h := [Vn\, 

and define the polynomial 

F := (X + hf-ne Z[X\. 

In addition to the usual “smoothness parameter” y, we need a “sieving parameter” 
z, whose choice will be discussed below. We shall assume that both y and z are 
of the form exp[(log «) 1 / 2+0(1) ], and our ultimate choices of y and z will indeed 
satisfy this assumption. 

For all s = 1,2 LaI- we shall determine which values of s arc “good,” in 

the sense that the corresponding value F(s ) is y-smooth. For each good s, since 
we have F(s ) = (s + h) 2 (mod n), we obtain one relation of the form (15.6), 
with a, := [s + ti\ n . If we find at least k + 1 good values of s, then we can apply 
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Gaussian elimination as usual to find a square root y of 1 in Z* . Hopefully, we will 
have y f ±1, allowing us to split n. 

Observe that for 1 < s < z, we have 

1 < F(s) <z 2 + 2 zn l/2 < n l/2+o(l) . 

Now, although the values F(s ) are not at all random, we might expect heuristically 
that the number of good s up to z is roughly equal to az, where a is the probability 
that a random integer in the interval [1 , n 1 / 2 ] is y-smooth, and by Theorem 15.7, 
we have 

6- = exp[(-i + o(l))(log «/ log y) log log w]. 

If our heuristics arc valid, this already yields an improvement over Algorithm SEF, 
since now we arc looking for y-smooth numbers near n 1 / 2 , which arc much more 
common than y-smooth numbers near n. But there is another improvement possi- 
ble; namely, instead of testing each individual number F(s ) for smoothness using 
trial division, we can test them all at once using the following “sieving procedure.” 

The sieving procedure works as follows. First, we create an array v[l . . . [zj ], 
and initialize v[s] to F(s), for 1 < s < z- Then, for each prime p up to y, we do 
the following: 

1 . Compute the roots of the polynomial F modulo p. 

This can be done quite efficiently, as follows. For p = 2, F has exactly 
one root modulo p, which is determined by the parity of h. For p > 2, 
we may use the familiar quadratic formula together with an algorithm for 
computing square roots modulo p, as discussed in Exercise 12.7. A quick 
calculation shows that the discriminant of F is An, and thus, F has a root 
modulo p if and only if n is a quadratic residue modulo p, in which case it 
will have two roots ( under our usual assumptions, we cannot have p \ n). 

2. Assume that F has v p distinct roots modulo p lying in the interval [ I , /; ] ; 
call them r\,...,r Vp . 

Note that v p = I for p = 2 and v p e {0,2} for p > 2. Also note that 
F(s) =0 (mod p) if and only if s = r, (mod p) for some i = \,...,v p . 

For i = 1, . . . , v p , do the following: 

5 <- r,- 

while s < z do 

repeat v[s] v[s]/p until p\v\s\ 

S <r- S + P 


At the end of this sieving procedure, the good values of s may be identified as 
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precisely those such that v[s] = 1. The running time of this sieving procedure is at 
most len(n)° (11 times 

y - = Z T - = 0(2 log logy) = 2 1+0(1) . 

p ‘—t p 

Here, we have made use of Theorem 5.10, although this is not really necessary — 
for our purposes, the bound £ < 1/p = O(logy) would suffice. Note that this 
sieving procedure is a factor of k l+o(l) faster than the method for finding smooth 
numbers based on trial division. With just a little extra book-keeping, we can not 
only identify the good values of 5 but also compute the factorization of F(s) into 
primes, at essentially no extra cost. 

Now, let us put together all the pieces. We have to choose 2 just large enough 
so as to find at least k + 1 good values of s up to z . So we should choose z so 
that 2 ~ k/t 7 — in practice, we could choose an initial estimate for 2 , and if this 
choice of 2 does not yield enough relations, we could keep doubling 2 until we do 
get enough relations. Assuming that 2 ~ k/a , the cost of sieving is (k/er) 1+o(1 \ or 

exp[(l + o(l))(i(log n/ log y ) log log n + log y)]. 

The cost of Gaussian elimination is still 0(k 3 ), or 

exp[(3 + o(l)) log y]. 

Thus, the total running time is bounded by 

exp[(l + o(l)) max{i(logn/ logy) log log n + logy, 3 logy}]. 

Let p := log y, A := (1 /4) log n log log n, S 1 := A/ p + p and S 2 := 3 p, and let us 
find the value of p that minimizes max{ Ai, S5 j . Using a little calculus, one finds 
that Si is minimized at p = A 1 / 2 . For this value of p, we have ,Sj = 2A 1 / 2 and 
S 2 = 3 A 1//2 > Si, and so this choice of p is a bit larger than optimal. For p < A 1 / 2 , 
A] is decreasing (as a function of p), while Si is always increasing. It follows that 
the optimal value of p is obtained by setting 

A/p + p = 3 p, 

and solving for p. This yields p = (A/2) 1 / 2 . So setting 

y := exp[(l/2V2)(log«loglogn) 1/2 ], 

the total running time of the quadratic sieve factoring algorithm is bounded by 
exp[(3 /2V2 + o(l))(log n log log n) 1 / 2 ]. 

Thus, we have reduced the constant in the exponent from 2 (for Algorithm SEF 
with the more accurate smoothness density estimates) to 3/2V2 x 1.061. 
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We mention one final improvement. The matrix to which we apply Gaussian 
elimination in stage 2 is “sparse”; indeed, since any integer less than n has 0(log n) 
prime factors, the total number of non-zero entries in the matrix is k 1 +o( 1 - 1 . There 
arc special algorithms for working with such sparse matrices, which allow us to 
perform stage 2 of the factoring algorithm in time k 2+ ° n> , or 

exp[(2 + o(l)) log y]. 


Setting 


y := exp[^(log n log log «) 1/2 ], 


the total running time is bounded by 

exp[(l + o(l))(log « log log «) 1//2 ]. 

Thus, this improvement reduces the constant in the exponent from 3/2V2 « 1.061 
to 1. Moreover, the special algorithms designed to work with sparse matrices typ- 
ically use much less space than ordinary Gaussian elimination (even if the input 
to Gaussian elimination is sparse, the intermediate matrices will not be). We shall 
discuss in detail later, in §18.4, one such algorithm for solving sparse systems of 
linear equations. 


The quadratic sieve may fail to factor n, for one of two reasons: first, it may 
fail to find k + 1 relations; second, it may find these relations, but in stage 2, it 
finds only a trivial square root of 1. There is no rigorous theory to say why the 
algorithm should not fail for one of these two reasons, but experience shows that 
the algorithm does indeed work as expected. 


15.5 Notes 

Many of the algorithmic ideas in this chapter were first developed for the problem 
of factoring integers, and then later adapted to the discrete logarithm problem. 
The first (heuristic) subexponential-time algorithm for factoring integers, called 
the continued fraction method (not discussed here), was introduced by Lehmer 
and Powers [59], and later refined and implemented by Morrison and Brillhart 
[70]. The first rigorously analyzed subexponential-time algorithm for factoring 
integers was introduced by Dixon [35]. Algorithm SEF is a variation of Dixon’s 
algorithm, which works the same way as Algorithm SEF, except that it generates 
relations of the form (15.6) directly (and indeed, it is possible to prove a variant 
of Theorem 15.1, and for that matter. Theorem 15.7, for random squares modulo 
n). Algorithm SEF is based on an idea suggested by Rackoff (personal communi- 
cation). 

Theorem 15.7 was proved by Canfield, Erdos, and Pomerance [23]. 
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The quadratic sieve was introduced by Pomerance [78]. Recall that the quadratic 
sieve has a heuristic running time of 

exp[(l + o(l))(lognloglog«) 1 / 2 ]. 

This running time bound can also be achieved rigorously by a result of Lenstra 
and Pomerance [61], and to date, this is the best rigorous running time bound for 
factoring algorithms. We should stress, however, that most practitioners in this 
field arc not so much interested in rigorous running time analyses as they arc in 
actually factoring integers, and, for such purposes, heuristic running time estimates 
arc quite acceptable. Indeed, the quadratic sieve is much more practical than the 
algorithm in [61], which is mainly of theoretical interest. 

There arc two other factoring algorithms not discussed here, but that should 
anyway at least be mentioned. The first is the elliptic curve method, introduced 
by Lenstra [60]. Unlike all of the other known subexponential-time algorithms, the 
running time of this algorithm is sensitive to the sizes of the factors of n; in partic- 
ular, if p is the smallest prime dividing «, the algorithm will find p (heuristically) 
in expected time 

exp[(V2 + o(l)Xlogploglogp) 1 / 2 ] • len(n) oa) . 

This algorithm is quite practical, and is the method of choice when it is known 
(or suspected) that n has some small factors. It also has the advantage that it uses 
only polynomial space (unlike all of the other known subexponential-time factoring 
algorithms). 

The second is the number field sieve, the basic idea of which was introduced by 
Pollard [77], and later generalized and refined by Buhler, Lenstra, and Pomerance 
[21], as well as by others. The number field sieve will split n (heuristically) in 
expected time 

exp[(c + o(l))(log «) 1/3 (log log n) 2/3 \, 

where c is a constant (currently, the smallest value of c is 1.902, a result due to 
Coppersmith [27]). The number field sieve is currently the asymptotically fastest 
known factoring algorithm (at least, heuristically), and it is also practical, having 
been used to set the latest factoring record — the factorization of a 200-decimal- 
digit integer that is the product of two primes of about the same size. See the web 
page www.crypto-world.com/FactorRecords.html for more details (as well 
as for announcements of new records). 

As for subexponential-time algorithms for discrete logarithms, Adleman [1] 
adapted the ideas used for factoring to the discrete logarithm problem, although 
it seems that some of the basic ideas were known much earlier. Algorithm SEDL 
is a variation on this algorithm, and the basic technique is usually referred to as the 
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index calculus method. The basic idea of the number field sieve was adapted to the 
discrete logarithm problem by Gordon [42] ; see also Adleman [2] and Schirokauer, 
Weber, and Denny [84]. 

For many more details and references for subexponential-time algorithms for 
factoring and discrete logarithms, see Chapter 6 of Crandall and Pomerance [30]. 
Also, see the web page www.crypto-world.com/FactorWorld.html for links 
to research papers and implementation reports. 

For more details regarding the security of signature schemes, as discussed fol- 
lowing Exercise 15.4, see the paper by Bellare and Rogaway [13]. 

Last, but not least, we should mention the fact that there arc in fact polynomial- 
time algorithms for factoring and for computing discrete logarithms; however, 
these algorithms require special hardware, namely, a quantum computer. Shor 
[92, 93] showed that these problems could be solved in polynomial time on such a 
device; however, at the present time, it is unclear when and if such machines will 
ever be built. Much, indeed most, of modern-day cryptography will crumble if this 
happens, or if efficient “classical” algorithms for these problems arc discovered 
(which is still a real possibility). 



16 

More rings 


This chapter develops a number of more advanced concepts concerning rings. 
These concepts will play important roles later in the text, and we prefer to dis- 
cuss them now, so as to avoid too many interruptions of the flow of subsequent 
discussions. 


16.1 Algebras 

Throughout this section, R denotes a ring (i.e., a commutative ring with unity). 

Sometimes, a ring may also be naturally viewed as an R- module, in which case, 
both the theory of rings and the theory of modules may be brought to hear to study 
its properties. 

Definition 16.1. An R-algebra is a set E, together with addition and multiplica- 
tion operations on E, and a function p : Rx E -> E, such that 

(i) with respect to addition and multiplication, E forms a ring; 

(ii) with respect to addition and the scalar multiplication map p, E forms an 
R-module; 

(iii) for all c e R, and a,p e E, we have 

p(c,a)/l = p{c,ajl) = aq(c, /?). 

An .R-algebra E may also be called an algebra over R. As we usually do for 
R-modules, we shall write ca (or c ■ a ) instead of p{c, a). When we do this, paid 
(iii) of the definition states that 

( ca)P = c(a/?) = a(c/l) 

for all c e R and a, /? e E. In particular, we may write ca/? without any ambiguity. 
Note that there arc two multiplication operations at play here: scalar multiplication 
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(such as ca ), and ring multiplication (such as aft). Also note that since we arc 
assuming E is commutative, the second equality in part (iii) is already implied 
by the first. A simple consequence of the definition is that for all c,d € R and 
a, f e E, we have (ca)(d/l) = (cd)(aji). From this, it follows that for all c e R, 
a e E, and k > 0, we have ( ca) k = c k a k . 

Example 16.1. Suppose £ is a ring and r : R -» E is a ring homomorphism. With 
scalar multiplication defined by ca := r(c)a for c e R and a e E, one may easily 
check that E is indeed an R-algebra. In this case, we say that E is an R-algebra 

via the map r. □ 

Example 16.2. If I? is a subring of E. then with r : R — »• E being the inclusion 
map, we can view E as an R-algebra as in the previous example. In this case, we 
say that E is an R-algebra via inclusion. □ 

Example 16.3. If t : R -> E is a natural embedding of rings, then by a slight 
abuse of terminology, just as we sometimes say that R is a subring of E, we shall 
also say that E is an R-algebra via inclusion. □ 

In fact, all R-algebras can be viewed as special cases of Example 16.1: 

Theorem 16.2. If E is an R-algebra, then the map 

t: R^ E 

c c ■ l E , 

is a ring homomorphism, and ca = r(c)a for all c e R and a e E. 

Proof. Exercise. □ 

In the special situation where R is afield, we can say even more. In this situation, 
and with r as in the above theorem, then either E is trivial or r is injective (see 
Exercise 7.47). In the latter case, E contains an isomorphic copy of R as a subring. 
To summarize: 

Theorem 16.3. If R is a held, then an R-algebra is either the trivial ring or con- 
tains an isomorphic copy of R as a subring. 

The following examples give further important constructions of R-algebras. 

Example 16.4. If £),... , E/ ( are R-algebras, then their direct product E\ x ■ ■ ■ x Ek 
is an .R-algebra as well, where addition, multiplication, and scalar multiplication 
are defined component- wise. As usual, if E = E\ = • • • = E^, we write this as 
E xk . □ 
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Example 16.5. If I is an arbitrary set, and E is an ft-algcbra, then Map( /, E), 
which is the set of all functions f : I E, may be naturally viewed as an 
ft -algebra, with addition, multiplication, and scalar multiplication defined point- 
wise. □ 

Example 16.6. Let E be an ft-algcbra and let I be an ideal of E. Then it is easily 
verified that I is also a submodule of E. This means that the quotient ring E/I 
may also be viewed as an _R-module, and indeed, it is an _R-algebra, called the 
quotient algebra (over R ) of E modulo I. For a. jl e E and c e ft. addition, 
multiplication, and scalar multiplication in E arc defined as follows: 

[a]i + [/?]/ := [a + /?| /, [a]/ • [/?]/ := [a • /?]/, c • [a]/ := [c • a]/. □ 

Example 16.7. The ring of polynomials R[X] is an ft-algcbra via inclusion. Let 
/ 6 .R[A] be a non-zero polynomial with lc(/) e R*. We may form the quotient 
ring E := i?[i^]/(/), which may naturally be viewed as an ft-algcbra, as in the 
previous example. If deg(/) = 0, then E is trivial; so assume deg(/) > 0, and 
consider the map 

t : R^ E 
c\-+ c ■ 1 e 

from Theorem 16.2. By definition, r(c) = [c] /. As discussed in Example 7.55, the 
map t is a natural embedding of rings, and so by identifying ft with its image in 
E under r, we can view R as a subring of E: therefore, we can also view E as an 
ft-algebra via inclusion. □ 


Subalgebras 

Let E be an ft-algebra. A subset S of E is called a subalgebra (over R) of E if it 
is both a subring of E and a submodule of E. This means that S contains 1 e, and 
is closed under addition, multiplication, and scalar multiplication; restricting these 
operations to S, we may view S as an ft-algebra in its own right. 

The following theorem gives a simple but useful characterization of subalgebras, 
in relation to subrings: 

Theorem 16.4. If E is an R-algebra via inclusion, and S is a subring of E, then 
S is a subalgebra if and only if S contains R. More generally, if E is an arbitrary 
R-algebra, and S is a subring of E, then S is a subalgebra of E if and only if S 
contains c ■ \ e for all c e R. 


Proof. Exercise. □ 
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R-algebra homomorphisms 

Let E and E' be R-algebras. A function p : E -> E' is called an R-algebra 
homomorphism if p is both a ring homomorphism and an R-linear map. This 
means that p( I r) = 1 and 

p(a +P) = p(a) + p(P), p(afi) = p{a)p(f), and p(ca ) = cp(ot) 

for all a,p e E and all c e R. As usual, if p is bijective, then it is called an 
R-algebra isomorphism, and if, in addition, E = E' , it is called an .R-algebra 
automorphism. 

The following theorem gives a simple but useful characterization of R-algebra 
homomorphisms, in relation to ring homomorphisms: 

Theorem 16.5. If E and E' are R-algebras via inclusion, and p : E — »• E' is 
a ring homomorphism, then p is an R-algebra homomorphism if and only if the 
restriction of p to R is the identity map. More generally, if E and E' are arbitrary 
R-algebras and p : E — »• E' is a ring homomoiphism, then p is an R-algebra 
homomorphism if and only if p(c ■ I e) = c ■ \e> for all c € R. 

Proof. Exercise. □ 

Example 16.8. If E is an R-algebra and I is an ideal of E, then as observed in 
Example 16.6, I is also a submodule of E, and we may form the quotient algebra 
E/I. The natural map 

p: E — > E / 1 
a [a]/ 

is both a ring homomorphism and an R-linear map, and hence is an R-algebra 
homomorphism. □ 

Example 16.9. Since C contains Rasa subring, we may naturally view C as an 
M-algebra via inclusion. The complex conjugation map on C that sends a + bi to 
a — bi, for a, b e M, is an M-algebra automorphism on C (see Example 7.5). □ 

Many simple facts about R-algebra homomorphisms can be obtained by com- 
bining corresponding facts for ring and R-module homomorphisms. For example, 
the composition of two R-algebra homomorphisms is again an R-algebra homo- 
morphism, since the composition is both a ring homomorphism and an R-linear 
map (Theorems 7.22 and 13.6). As another example, if p : E — »• E' is an R- 
algebra homomorphism, then its image S' is both a subring and a submodule of 
E' , and hence. S' is a subalgebra of E'. The kernel K of p is an ideal of E, and 
we may form the quotient algebra E/K. The first isomorphism theorems for rings 
and modules (Theorems 7.26 and 13.9) tell us that E/K and S' are isomorphic 
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both as rings and as R-modules, and hence, they arc isomorphic as R-algebras. 
Specifically, the map 

p : E/K -> E' 

[a] K p(a) 

is an injective R-algebra homomorphism whose image is S' . 

The following theorem isolates an important subalgebra associated with any R- 
algebra homomorphism p : E -*■ E. 

Theorem 16.6. Let E be an R-algebra, and let p : E -» E be an R-algebra 
homomorphism. Then the set S := [a e E : p{a) = a } is a subalgebra of E, 
called the subalgebra of E fixed by p. Moreover, if E is a held, then so is S. 

Proof. Let us verify that S is closed under addition. II' a, (1 e S, then we have 

p(a + P) = p(a ) + p(P) (since p is a group homomorphism) 

= a + p (since a, p e S). 

Using the fact that p is a ring homomorphism, one can similarly show that S is 
closed under multiplication, and that e 5. Likewise, using the fact that p is an 
/LI incar map, one can also show that S is closed under scalar multiplication. 

This shows that S is a subalgebra, proving the first statement. For the second 
statement, suppose that E is a field. Let a be a non-zero element of S, and suppose 
P e E is its multiplicative inverse, so that ajl = 1 E - We want to show that /I lies in 
S. Again, using the fact that p is a ring homomorphism, we have 

a p = 1 E = p(l E ) = p{aP ) = p(a)p(P) = ap(P), 
and hence ap = ap(P)\ canceling a, we obtain /? = p(P), and so p e S. □ 

Example 16.10. The subalgebra of C fixed by the complex conjugation map is 

M. □ 


Polynomial evaluation 

Let E be an .R-algebra. Consider the ring of polynomials R\X ] (which is an R- 
algebra via inclusion). Any polynomial g e R\X ] naturally defines a function on 
E: if g = Yjj aiX\ with each n, 6 R, and a e E, then 

8(a) ■= ^ aia'. 

i 

Just as for rings, we say that a is a root of g if g(a ) = 0^. 
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For fixed a e E, the polynomial evaluation map 

p : R[X ] -> E 

g g(oc) 

is easily seen to be an R-algebra homomorphism. The image of p is denoted R[er], 
and is a subalgebra of E. Indeed, R[ar] is the smallest subalgebra of E contain- 
ing a, and is called the subalgebra (over R ) generated by a. Note that if E is 
an R-algebra via inclusion, then the notation R[a] has the same meaning as that 
introduced in Example 7.44. 

We next state a very simple, but extremely useful, fact: 

Theorem 16.7. Let p : E -*■ E' be an R-algebra homomorphism. Then for all 
g 6 R[X] and a e E, we have 

p(g (« )) = g(p(a)). 

Proof. Let g = G R[X], Then we have 

P(g(a)) = p(^j a i a ') = ^ P(.OiOt‘) = ^aip(a') = ^ a/p(a)' 

i i i i 

= g(p(a))- □ 

As a special case of Theorem 16.7, if E = R\a\ for some a e E. then every 
element of E can be expressed as g(a) for some g e 1?[X], and p{g{a)) = g(p{a))\ 
hence, the action of p is completely determined by its action on a. 

Example 16.11. Let / e R[X] be a non-zero polynomial with lc(/) e it*. As in 
Example 16.7, we may form the quotient algebra E := i?[A]/(/). 

Let <* := [X] / e E. Then E = R\c\, and moreover, every element of E can be 
expressed uniquely as g(|), where g e L[X] and deg(g) < deg(/). In addition, c 
is a root of /. If deg(/) > 0, these facts were already observed in Example 7.55, 
and otherwise, they arc trivial. 

Now let E' be any .R-algebra, and suppose that p : E -» E' is an R-algebra 
homomorphism, and let <f := p(c). By the previous theorem, p sends g(f) 
to g(|') ; for each g e R[X], Thus, the image of p is R[^']. Also, we have 
/(<?') = /(/?(!)) = P(/(D) = p(Qe) = 0 e'. Therefore, E must be a root of /. 

Conversely, suppose that f e E' is a root of /. Then the polynomial evalu- 
ation map from R[X] to E' that sends g e R[X] to g(|') e E’ is an R-algebra 
homomorphism whose kernel contains /. Using the generalized versions of the 
first isomorphism theorems for rings and R-modules (Theorems 7.27 and 13.10), 
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we obtain the jR-algebra homomorphism 

p : E -> E' 

g(D ■- g(r). 

One sees that complex conjugation is just a special case of this construction (see 
Example 7.57). □ 

Exercise 16. 1 . Let E be an R-algebra. For a e E. consider the a-multiplication 
map on E, which sends ft e E to aft e E. Show that this map is an R-linear map. 

Exercise 16.2. Show that every ring may be viewed in a unique way as a Z- 
algebra, and that subrings arc subalgebras, and ring homomorphisms arc Z-algebra 
homomorphisms. 

Exercise 16.3. Show that the only M-algebra homomorphisms from C into itself 
are the identity map and the complex conjugation map. 


16.2 The field of fractions of an integral domain 

Let D be an integral domain. Just as we can construct the field of rational numbers 
by forming fractions involving integers, we can construct a field consisting of frac- 
tions whose numerators and denominators are elements of D. This construction is 
quite straightforward, though a bit tedious. 

To begin with, let S be the set of all pairs of the form (a, b), with a,beD 
and b f- ()/->. Intuitively, such a pair ( a , b ) is a “formal fraction,” with numer- 
ator a and denominator b. We define a binary relation ~ on S as follows: for 
(a l ,b l ),(a 2 ,b 2 ) e S , we say (auh) ( a 2 ,b 2 ) if and only if a\b 2 = a 2 b\. Our 
first task is to show that this is an equivalence relation: 

Lemma 16.8. For all (a\, b\), (a 2 , b 2 ), ( a 3 , bf) € S, we have 

(i) (ai, b\) ~ (a\,b\); 

(ii) (ai,bi) ( a 2 , b 2 ) implies (a 2 , b 2 ) {aubi); 

(iii) (ai, b\) ~ (a 2 , b 2 ) and (a 2 , b 2 ) ~ (a 3 , b 3 ) implies (a u b\ ) ~ (n 3 , Ip). 

Proof, (i) and (ii) are rather trivial, and we do not comment on these any further. As 
for (iii), assume that a\b 2 = a 2 b\ and a 2 b 3 = a 3 b 2 . Multiplying the first equation 
by Ip , we obtain a\b 2 b 3 = a 2 b\b 3 and substituting a 3 b 2 for a 2 b 3 on the right-hand 
side of this last equation, we obtain a\b 2 b 3 = a 3 b 2 b\. Now, using the fact that b 2 
is non-zero and that D is an integral domain, we may cancel b 2 from both sides, 
obtaining a\b 3 = a 3 b\. □ 



428 


More rings 


Since ~ is an equivalence relation, it partitions S into equivalence classes, and 
for (a, b) e S, we denote by [a, b ] the equivalence class containing ( a , b ), and 
we denote by K the set of all such equivalence classes. Our next task is to define 
addition and multiplication operations on equivalence classes, mimicking the usual 
rules of arithmetic with fractions. We want to define the sum of [«i, b\ ] and [c/2, Zh] 
to be [aiZ>2 + oib\,b\b2\, and the product of [ai, b\\ and [02, 62] to be {a\a2, b\bf\. 
Note that since D is an integral domain, if b\ and />2 are non-zero, then so is the 
product b\bi, and therefore \a\b2 + 02b] , />i £>2] and \a\a2. /q />2 ] arc indeed equiv- 
alence classes. However, to ensure that this definition is unambiguous, and does 
not depend on the particular choice of representatives of the equivalence classes 
[ai, b\\ and [ a 2, &2L we need the following lemma. 

Lemma 16.9. Let (ai, b\), (a'j, b\), ( 02 , Z> 2 ), W 2 , b' 2 ) e S, where («i, b\) ~ (a\ , b\ ) 
and (. 02 , 62) ~ {a' 2 , b'f). Then we have 

(«iZ)2 + a2bi, ZqM) ~ (a\ b’ 2 + a!~,b\, 

and 

{a\a 2 ,b\b 2 ) /v (a { a 2 , ^2)’ 

Proof. This is a straightforward calculation. Since a \ b\ = a\b \ and = a!,^, 
we have 

(fli Z>2 + ^ib\)b\b\ = a\b2b\b\ + a2b\b' { b\ = a' i b2b[b' 1 + a\b\b' { b2 
= (a\b' 2 + a! 2 b'f)b\b2 

and 

a\ci2b' { b' n = a' l a2b\b' 1 = a' l a^bib2. □ 

In light of this lemma, we may unambiguously define addition and multiplication 
on K as follows: for [a\,b\], [02, 62] £ K, we define 

[ai,bi\ + [ a 2 ,b 2 ] := [a 162 + a 2 *i, *i 62] 

and 

[a u b { ] ■ [a 2 ,b 2 \ := [aia 2 ,M2]- 

The next task is to show that K is a ring — we leave the details of this (which 
arc quite straightforward) to the reader. 

Lemma 16.10. With addition and multiplication as defined above, K is a ring, 
with additive identity [0d, 1 o ] and multiplicative identity [Id, Id]- 


Proof. Exercise. □ 



16.2 The field of fractions of an integral domain 


429 


Finally, we observe that K is in fact a field: it is clear that [a, b] is a non-zero 
element of K if and only if a f On, and hence any non-zero element [a, b ] of K 
has a multiplicative inverse, namely, [ b , a\. 

The field K is called the field of fractions of D. Consider the map t : D — »• K 
that sends a e D to [a, 1^] e K. It is easy to see that this map is a ring homomor- 
phism, and one can also easily verify that it is injective. So, stalling from D , we 
can synthesize “out of thin air” its field of fractions K, which essentially contains 
D as a subring, via the natural embedding t : D -» K. 


Now suppose that we are given a field L that contains D as a subring. Consider 
the set K' consisting of all elements of L of the form ab~ l , where a. b e D and 
b f On — note that here, the arithmetic operations are performed using the rules 
for arithmetic in L. One may easily verify that K' is a subfield of L that contains 
D , and it is easy to see that this is the smallest subfield of L that contains D. The 
subfield K' of L may be referred to as the field of fractions of D within L. One 
may easily verify that the map p : K -> L that sends [ a , b] e K to air 1 e L is an 
unambiguously defined ring homomorphism that maps K injectively onto K'. If 
we view K and L as D-algcbras via inclusion, and we see that the map p is in fact 
a D-algebra homomorphism. Thus, K and K' are isomorphic as D-algcbras. It is 
in this sense that the field of fractions K is the smallest field that contains D as a 
subring. 


From now on, we shall simply write an element [a, b ] of K as the fraction a/b. 
In this notation, the above rules for addition, multiplication, and testing equality in 
K now look quite familial - : 


«2 _ aib2 + a2bi a\ «2 
b\ b 2 b\bi ’ b\ bi 


a\a2 a i 


02 

b 2 


a\b 2 = «2^i- 


Function fields 

An important special case of the above construction for the field of fractions of D 
is when D = F[X], where F is a field. In this case, the field of fractions is denoted 
F{X), and is called the field of rational functions (over F). This terminology is 
a bit unfortunate, since just as with polynomials, although the elements of F(X) 
define functions, they are not (in general) in one-to-one correspondence with these 
functions. 

Since F[X] is a subring of F(X), and since F is a subring of F[X], we see that 
F is a subfield of F(X). 

More generally, we may apply the above construction to D = F[X, ■ ■ ■ , X , \, 
the ring of multi- variate polynomials over the field F. in which case the field of 
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fractions is denoted F(X i X„), and is also called the field of rational functions 

(over F, in the variables X], X n ). 


Exercise 16.4. Let F be a field of characteristic zero. Show that F contains an 
isomorphic copy of ©. 

Exercise 16.5. Show that the field of fractions of Z [/] within C is Q|/|. (See 
Example 7.25 and Exercise 7.14.) 


16.3 Unique factorization of polynomials 

Throughout this section, F denotes a field. 

Like the ring Z, the ring F\ X ] of polynomials is an integral domain, and because 
of the division with remainder property for polynomials, F\ X ] has many other 
properties in common with Z. Indeed, essentially all the ideas and results from 
Chapter 1 can be carried over almost verbatim from Z to F[X], and in this section, 
we shall do just that. 

Recall that the units of F\X\ are precisely the units F* of F , that is, the non- 
zero constants. We call two polynomials g, h e F\X\ associate if g = ch for some 
c e F*. It is easy to see that g and h are associate if and only if g \ h and h \ g — 
indeed, this follows as a special case of part (i) of Theorem 7.4. Clearly, any non- 
zero polynomial g is associate to a unique monic polynomial (i.e., a polynomial 
with leading coefficient 1), called the monic associate of g ; indeed, the monic 
associate of g is lc(g) -1 • g (where, as usual, lc(g) denotes the leading coefficient 
ofg). 

We call a polynomial / e F\X\ irreducible if it is non-constant and all divisors 
of / are associate to 1 or /. Conversely, we call / reducible if it is non-constant 
and is not irreducible. Equivalently, a non-constant polynomial / is reducible if 
and only if there exist polynomials g,h e F\X\ of degree strictly less than that of 
/ such that / = gh. 

Clearly, if g and h are associate polynomials, then g is irreducible if and only if 
h is irreducible. 

The irreducible polynomials play a role similar to that of the prime numbers. Just 
as it is convenient to work with only positive prime numbers, it is also convenient 
to restrict attention to monic irreducible polynomials. 

Corresponding to Theorem 1.3, every non-zero polynomial can be expressed as 
a unit times a product of monic irreducibles in an essentially unique way: 

Theorem 16.11. Every non-zero polynomial f e F\X\ can be expressed as 

f = c ■ p y ■■■p r , 
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where c e F*, pi, ..., p r are distinct monic irreducible polynomials, and e\ e r 

are positive integers. Moreover, this expression is unique, up to a reordering of the 
irreducible polynomials. 

To prove this theorem, we may assume that / is monic, since the non-monic 
case trivially reduces to the monic case. 

The proof of the existence part of Theorem 16. 1 1 is just as for Theorem 1.3. If 
/ is 1 or a monic irreducible, we arc done. Otherwise, there exist g,h e F\X\ of 
degree strictly less than that of / such that / = gh, and again, we may assume that 
g and h are monic. By induction on degree, both g and h can be expressed as a 
product of monic irreducible polynomials, and hence, so can /. 

The proof of the uniqueness paid of Theorem 16.1 1 is almost identical to that of 
Theorem 1.3. The key to the proof is the division with remainder property. Theo- 
rem 7.10, from which we can easily derive the following analog of Theorem 1.6: 

Theorem 16.12. Let I be an ideal of F\X\. Then there exists a unique polynomial 
d e F\X\ such that I = dF\X \ and d is either zero or monic. 

Proof. We first prove the existence paid of the theorem. If I = {0} , then d = 0 does 
the job, so let us assume that I {0}. Since I contains non-zero polynomials, it 
must contain monic polynomials, since if g is a non-zero polynomial in I, then its 
monic associate lc(g) -1 g is also in I. Let d be a monic polynomial of minimal 
degree in I. We want to show that I = dF\ X\. 

We first show that I C dF\X\. To this end, let g be any element in I. It suf- 
fices to show that d \ g. Using Theorem 7. 10, we may write g = dq + r, where 
deg(r) < deg(d). Then by the closure properties of ideals, one sees that r = g — dq 
is also an element of I, and by the minimality of the degree of d, we must have 
r = 0. Thus, d \ g. 

We next show that dF\ X ] C I. This follows immediately from the fact that 
del and the closure properties of ideals. 

That proves the existence paid of the theorem. As for uniqueness, note that if 
dF[X] = eF[X], we have d \ e and e \ d, from which it follows that d and e are 
associate, and so if d and e arc both either monic or zero, they must be equal. □ 

For g,h e .F[X], we call d e F'fX] a common divisor of g and h if d \ g and 
d \ h: moreover, we call such a d a greatest common divisor of g and h if d is 
monic or zero, and all other common divisors of g and h divide d. Analogous to 
Theorem 1.7, we have: 

Theorem 16.13. For all g, h e F \ X |, there exists a unique greatest common divi- 
sor d of g and h, and moreover, g-F[X] + hF[X] = dF[X ]. 

Proof. We apply the previous theorem to the ideal I := gF\X\ + hF\X\. Let 
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d e F\X\ with I = dF[X ], as in that theorem. Note that g,h,d e I and d is monic 
or zero. 

It is clear that d is a common divisor of g and h. Moreover, there exist s, t e F\X\ 
such that gs + ht = d. If d' \ g and d' \ h, then cleai'ly d' \ ( gs + ht ), and hence 
d' | d. 

Finally, for uniqueness, if e is a greatest common divisor of g and h, then d \ e 
and e \ d, and hence e is associate to d, and the requirement that e is monic or zero 
implies that e = d. □ 

For g,h e F\ X ], we denote by gcd(g, h) the greatest common divisor of g and 
h. Note that as we have defined it, lc(g) gcd(g, 0) = g. Also note that when at least 
one of g or h are non-zero, gcd(g, h) is the unique monic polynomial of maximal 
degree that divides both g and h. 

An immediate consequence of Theorem 16.13 is that for all g,h e F\ X |, there 
exist s,t e F[X] such that gs + ht = gcd(g, h), and that when at least one of g or h 
are non-zero, gcd(g, h ) is the unique monic polynomial of minimal degree that can 
be expressed as gs + ht for some s,t e F’fX], 

We say that g,h e F\X\ are relatively prime if gcd (g,h) = 1, which is 
the same as saying that the only common divisors of g and h are units. It is 
immediate from Theorem 16.13 that g and h are relatively prime if and only if 
gT[X] + hF[X ] = FfX], which holds if and only if there exist s, t G FTX] such 
that gs + ht = 1. 

Analogous to Theorem 1.9, we have: 

Theorem 16.14. For f,g,h e FTX] such that f \ gh and gcd (/, g) = 1, we have 
f\h. 

Proof. Suppose that / | gh and gcd(/,g) = 1. Then since gcd(/,g) = 1, by 
Theorem 16.13 we have fs + gt = 1 for some s,t e F’fX]. Multiplying this 
equation by h, we obtain fhs + ght = h. Since / | / by definition, and f \ gh by 
hypothesis, it follows that / | h. □ 

Analogous to Theorem 1.10, we have: 

Theorem 16.15. Let p e T[X] be irreducible, and let g,h e T[X]. Then p \ gh 
implies that p \ g or p \ h. 

Proof. Assume that p \ gh. The only divisors of p are associate to 1 or p. Thus, 
gcd( /;, g) is either 1 or the monic associate of p.lf p \ g, we arc done; otherwise, if 
p\ g, we must have gcd {p, g) = 1, and by the previous theorem, we conclude that 
p | h. □ 
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Now to prove the uniqueness part of Theorem 16.11. Suppose we have 

Pi ■ ■ ■ Pr = <7t • • • Qs, 

where p\,...,p r and q\,..., q s arc monic irreducible polynomials (with duplicates 
allowed among the pf s and among the q/s). If r = 0, we must have s = 0 and 
we are done. Otherwise, as p\ divides the right-hand side, by inductively applying 
Theorem 16.15, one sees that p\ is equal to qj for some j. We can cancel these 
terms and proceed inductively (on r). 

That completes the proof of Theorem 16.11. 

Analogous to Theorem 1.11, we have: 

Theorem 16.16. There are infinitely many monic irreducible polynomials in -F[X]. 

If F is infinite, then this theorem is true simply because there arc infinitely 
many monic, linear polynomials; in any case, one can easily prove this theorem 
by mimicking the proof of Theorem 1.11 (as the reader may verify). 

For a monic irreducible polynomial p, we may define the function v p , mapping 
non- zero polynomials to non-negative integers, as follows: for every polynomial 
f f 0, if f = j o e g, where p \ g, then v p (f) := e. We may then write the factoriza- 
tion of / into irreducibles as 

f = c\\p^f\ 

p 

where the product is over all monic irreducible polynomials p, with all but finitely 
many of the terms in the product equal to 1 . 

Just as for integers, we may extend the domain of definition of v p to include 0, 
defining v p (0) := oo. For all polynomials g, h, we have 

v P (g ■ h) = v p (g) + v p (h) for all p. (16.1) 

From this, it follows that for all polynomials g, h, we have 

h\g <4=> Vp(,h)<v p {g) for all p, (16.2) 

and 

v p(gcd(g, h )) = min(v p (g), v p (h)) for all p. (16.3) 

For g,h € F\ X |, a common multiple of g and h is a polynomial m such that 
g | m and /: | m; moreover, such an m is the least common multiple of g and h 
if m is monic or zero, and m divides all common multiples of g and h. In light of 
Theorem 16.11, it is clear that the least common multiple exists and is unique, and 
we denote the least common multiple of g and h by lcm(a, b). Note that as we have 
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defined it, lcm(g, 0) = 0, and that when both g and h are non-zero, lcm(g, h ) is 
the unique monic polynomial of minimal degree that is divisible by both g and h. 
Also, for all g, h e E[X], we have 

Vp(lcm(g, h)) = max(v p (g), v p (h)) for all p. (16.4) 

Just as in §1.3, the notions of greatest common divisor and least common multi- 
ple generalize naturally from two to any number of polynomials. We also say that 
a family of polynomials {g/}f =1 is pairwise relatively prime if gcd(g,-, gj) = 1 for 
all indices j with i ^ j. 

Also just as in §1.3, any rational function g/h e F(X) can be expressed as a 
fraction go /ho in lowest terms — that is. g/h = go /ho and gcd(go, ho) = 1 — and 
this representation is unique up to multiplication by units. 

Many of the exercises in Chapter 1 carry over naturally to polynomials — the 
reader is encouraged to look over all of the exercises in that chapter, determining 
which have natural polynomial analogs, and work some of these out. 

Example 16.12. Let f e F\ X ] be a polynomial of degree 2 or 3. Then it is easy to 
see that / is irreducible if and only if / has no roots in F. Indeed, if / is reducible, 
then it must have a factor of degree 1, which we can assume is monic; thus, we can 
write / = (X — x)g, where x e F and g e E[X], and so f(x ) = (x — x)g(x) = 0. 
Conversely, if x e F is a root of /, then X — x divides / (see Theorem 7.12), and 
so / is reducible. □ 

Example 16.13. As a special case of the previous example, consider the poly- 
nomials / := X 2 — 2 e Q[X] and g := X 3 — 2 e Q[X]. We claim that as 
polynomials over Q, / and g are irreducible. Indeed, neither of them have integer 
roots, and so neither of them have rational roots (see Exercise 1.26); therefore, they 
are irreducible. □ 

Example 16.14. In discussing the factorization of polynomials, one must be clear 
about the coefficient domain. Indeed, if we view / and g in the previous example 
as polynomials over M, then they factor into irreducibles as 

f = (X - V2)(X + V2), g = (X- ^2)(X 2 + s/2 X + s/4), 

and over C, g factors even further, as 

g = (X - $2)(X - ^2(1 + iV 3)/2)(X - $2(1 - iV 3)/2). □ 

Exercise 16.6. Suppose / = X;=o c iX' is an irreducible polynomial over F, 
where c<) ^ 0 and q ^ 0. Show that the “reverse” polynomial / := X/=o is 
also irreducible. 
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16.4 Polynomial congruences 

Throughout this section, F denotes a field. 

Many of the results from Chapter 2 on congruences modulo a positive inte- 
ger n carry over almost verbatim to congruences modulo a non-zero polynomial 
/ e F\X\. We state these results here — the proofs of these results arc essentially 
the same as in the integer case, and as such, arc omitted for the most paid. 

Because of the division with remainder property for polynomials, we have the 
analog of Theorem 2.4: 

Theorem 16.17. Let g, f e F[X], where f ^ 0. Then there exists a unique 
z £ .F[X] such that z = g (mod /) and deg(z) < deg(/), namely, z ■= g mod /. 

Corresponding to Theorem 2.5, we have: 

Theorem 16.18. Let g,fe F[X] with f ± 0, and let d := gcd(g, /). 

(i) For every h g F \ X ], the congruence gz = h (mod /) has a solution 
z g F\X\ if and only if d \ h. 

(ii) For every z. e F\X\, we have gz = 0 (mod /) if and only if z = 0 
(mod f /d) . 

(iii) For all z,z' £ F\X\, we have gz = gz! (mod /) if and only if z = z! 
(mod fid!). 

Letg, / g F\ X ] with / f 0. Paid (iii) of Theorem 16.18 gives us a cancellation 
law for polynomial congruences: 

if gcd(g, /) = 1 and gz = gz! (mod /), then z = z! (mod /). 

We say that z e F\X\ is a multiplicative inverse of g modulo / if gz = 1 (mod /). 
Paid (i) of Theorem 16.18 says that g has a multiplicative inverse modulo / if 
and only if gcd(g, /) = 1. Moreover, paid (iii) of Theorem 16.18 says that the 
multiplicative inverse of g, if it exists, is uniquely determined modulo /. 

As for integers, we may generalize the “mod” operation as follows. Suppose 
g,h, f g F[X], with / ^ 0, g ^ 0, and gcd(g, /) = 1. If s is the rational function 
h/g e F(X), then we define s mod / to be the unique polynomial z £ T[X] 
satisfying 

gz = h (mod /) and deg(z) < deg(/). 

With this notation, we can simply write g -1 mod / to denote the unique multi- 
plicative inverse of g modulo / of degree less than deg(/). 

Corresponding to Theorem 2.6, we have: 
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Theorem 16.19 (Chinese remainder theorem). Let {/,- } * =1 be a pairwise rela- 
tively prime family of non-zero polynomials in F\ X ]. and let g\ gk be arbi- 

trary polynomials in F\X\. Then there exists a solution g e F|X | to the system of 
congruences 

g = gi (mod /,) (2 = 1 

Moreover, any g' e F\X\ is a solution to this system of congruences if and only if 
g = g' (mod /), where f := ]“[f = i ft- 

Let us recall the formula for the solution g (see proof of Theorem 2.6). We have 

k 

s '■= Yj gieu 

i = 1 

where 

e« := f*ti, f* := f/fn U ■= ( f*)~ l mod /, (/ = 1, . . . , k). 

Now, let us consider the special case of the Chinese remainder theorem where 
fi = X — Xi with Xj g F, and g, = y, g F, for i = 1 ,k. The condition that 
{/,•}* : =1 is pairwise relatively prime is equivalent to the condition that the x, ’s arc 
distinct. Observe that a polynomial g e F\X\ satisfies the system of congruences 

g = gi (mod fi) (i = 1 

if and only if 

g(xd = y t O' = 1 ,...,k). 

Moreover, we have f * = J I ,y,(^ - x j) and f, = 1/ ]”[ .^ ( .(x,- - Xj) g F. So we get 

i rug - v) 

The reader will recognize this as the usual Lagrange interpolation formula (see 
Theorem 7.15). Thus, the Chinese remainder theorem for polynomials includes 
Lagrange interpolation as a special case. 

Polynomial quotient algebras. Let / £ f[X] be a polynomial of degree t > 0. 
and consider the quotient ring E := F[X]/{f). As discussed in Example 16.7, we 
may naturally view E as an F-algebra. Moreover, if we set := [X]/ e E, then 
E = F[f\, and viewing E as a vector space over F, we see that {%’~ l } j =| is a basis 
for E. 

Now suppose a e E. We have a = [g]y = g(f) for some g e L[X], and from 
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the above discussion about polynomial congruences, we see that a is a unit if and 
only if gcd(g, /) = 1. 

If i = 0, then E is trivial. If / is irreducible, then E is a field, since g ^ 0 (mod /) 
implies gcd(g, /) = 1. If / is reducible, then E is not a field, and indeed, not even 
an integral domain: for any non-trivial factor g e F\X\ of /, [g]y e £ is a zero 
divisor. 

The Chinese remainder theorem for polynomials also has a more algebraic inter- 
pretation. Namely, if {/ ; }f =1 is a pairwise relatively prime family of non-zero 
polynomials in F[X], and / := J^ =1 /,-, then the map 

e : F[X]/(f) -+ Fm/(h) x • • • X F[X]/(f k ) 
l gif » ([«]/i. ■•••[&] a) 

is unambiguously defined, and is in fact an F-algchra isomorphism. This map may 
be seen as a generalization of the ring isomorphism p discussed in Example 7.54. 

Example 16.15. The polynomial X 2 + 1 is irreducible over M, since if it were not, it 
would have aroot in M (see Example 16.12), which is clearly impossible, since -1 
is not the square of any real number. It follows immediately that C = M[X]/(X 2 +1) 
is a field, without having to explicitly calculate a formula for the inverse of a non- 
zero complex number. □ 

Example 16.16. Consider the polynomial / := X 4 + X 3 + 1 over Z 2 . We claim 
that / is irreducible. It suffices to show that / has no irreducible factors of degree 
1 or 2. 

If / had a factor of degree 1, then it would have a root; however, /(0) = 0 + 0 + 
1 = 1 and /(l) = 1 + 1 + 1 = 1. So / has no factors of degree 1. 

Does / have a factor of degree 2? The polynomials of degree 2 arc X 2 , X 2 + X, 
X 2 + 1, and X 2 + X + 1. The first and second of these polynomials arc divisible 
by X, and hence not irreducible, while the third has a 1 as a root, and hence is also 
not irreducible. The last polynomial, X 2 + X + 1, has no roots, and hence is the 
only irreducible polynomial of degree 2 over Z 2 . So now we may conclude that if 
/ were not irreducible, it would have to be equal to 

(X 2 + X + l) 2 = X 4 + 2X 3 + 3X 2 + 2X + 1 = X 4 + X 2 + 1, 
which it is not. 

Thus, E := Z 2 [X]/(/) is a field with 2 4 = 16 elements. We may think of ele- 
ments E as bit strings of length 4, where the rule for addition is bit-wise “exclusive - 
or.” The rule for multiplication is more complicated: to multiply two given bit 
strings, we interpret the bits as coefficients of polynomials (with the left-most bit 
the coefficient of X 3 ), multiply the polynomials, reduce the product modulo /, and 
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write down the bit string corresponding to the reduced product polynomial. For 
example, to multiply 1001 and 0011, we compute 

( X 3 + 1)(X + \) = X 4 + X 3 + X + l, 

and 

(X 4 + X 3 + X + 1) mod (X 4 + X 3 + l) = X. 

Hence, the product of 1001 and 0011 is 0010. 

Theorem 7.29 says that E* is a cyclic group. Indeed, the element | := 0010 
(i.e., | = [ X ] f) is a generator for E* , as the following table of powers shows: 


i 


i 


i 

0010 

8 

1110 

2 

0100 

9 

0101 

3 

1000 

10 

1010 

4 

1001 

11 

1101 

5 

1011 

12 

0011 

6 

1111 

13 

0110 

7 

0111 

14 

1100 



15 

0001 


Such a table of powers is sometimes useful for computations in small finite fields 
such as this one. Given a,/? e E*, we can compute aft by obtaining (by table 
lookup) i,j such that a = c' and (1 = cK computing k := (i + j ) mod 15, and then 
obtaining a/? = l; k (again by table lookup). □ 


16.5 Minimal polynomials 

Throughout this section, F denotes a field. 

Suppose that E is an arbitrary F-algebra, and let a be an element of E. Consider 
the polynomial evaluation map 

p : F[X] -> E 

g g(a), 

which is an F-algchra homomorphism. By definition, the image of p is F\a\. The 
kernel of p is an ideal of F \ X ], and since every ideal of F\ X ] is principal, it follows 
that Ker p = <pF[X] for some polynomial <p £ -F[X]; moreover, we can make the 
choice of (p unique by insisting that it is monic or zero. The polynomial tp is called 
the minimal polynomial of a (over F ). 

On the one hand, suppose <p ^ 0. Since any polynomial that is zero at a is a 
polynomial multiple of <p, we see that (p is the unique monic polynomial of smallest 
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degree that vanishes at a. Moreover, the first isomorphism theorems for rings and 
modules tell us that _F[a] is isomorphic (as an F-algebra) to f[X ]/(</;), via the 
isomorphism 

p: F[X]/($) -> F[a\ 

[g]<p g(a). 

Under this isomorphism, [X]^ e F\X\/(cp) corresponds to a e F\a\, and we see 
that { a l ~ 1 }“ is a basis for F\a\ over F, where m = deg (</>). In particular, every 
element of F[a] can be written uniquely as X!=i where c m e F. 

On the other hand, suppose <p = 0. This means that no non- zero polynomial van- 
ishes at a. Also, it means that the map p is injective, and hence F\a \ is isomorphic 
(as an F-algebra) to F[X]; in particular, F\a\ is not finitely generated as a vector 
space over F . 

Note that if a e E has a minimal polynomial <p f 0. then deg (<p) > 0, unless E 
is trivial (i.e., 1 e = 0 e), in which case <p = 1. 

Example 16.17. Consider the real numbers V2 and \fl. 

We claim that X 2 — 2 is the minimal polynomial of V2 over Q. To see this, first 
observe that V2 is a root of X 2 — 2. Thus, the minimal polynomial of V2 divides 
X 2 - 2. However, as we saw in Example 16. 13, the polynomial X 2 — 2 is irreducible 
over Q, and hence must be equal to the minimal polynomial of V2 over O. 

A similar argument shows that X ' — 2 is the minimal polynomial of \fl over Q. 
We also see that Q[V2] is isomorphic (as a Q-algebra) to O [ X ] / ( X 2 - 2), and 
since X 2 - 2 is irreducible, it follows that the ring Q[V2] is actually a field. As a 
vector space over Q, Q\ V2] has dimension 2, and every element of Q[V2] may be 
written uniquely as a + bV2 for a, b e Q. Indeed, for all a. b e O, not both zero, 
the multiplicative inverse of a + bsl 2 is ( a/c ) + (b/c)V2, where c := a 2 — 2b 2 . 

Similarly, Cj[ s/2\ is a field and has dimension 3 as a vector space over O, and 
every element of may be written uniquely as a+bs/2+cs/A for a, b, c S Q. □ 

A simple but important fact is the following: 

Theorem 16.20. Suppose E is an F -algebra, and that as an F -vector space, E 
has Unite dimension n. Then every a e E has a non-zero minimal polynomial of 
degree at most n. 

Proof. Indeed, the family of elements 

1 e, a,.. . ,a n 

must be linearly dependent (as must any family of // + I elements of a vector space 
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of dimension n), and hence there exist co, . . . , c n e E, not all zero, such that 

Co 1 e + ci a + ■ ■ ■ + c n a n = 0g, 

and therefore, the non-zero polynomial / := JV c,X‘ vanishes at a. □ 

Example 16.18. Let / e E[X] be a monic polynomial of degree l, and consider 
the E-algebra E := F\X\/(f ) = E[c], where | := [X]/ e E. Clearly, the minimal 
polynomial of | over F is /. Moreover, as a vector space over E, E has dimension 
£, with { c ,_l }| =| being a basis. Therefore, every a e E has a non-zero minimal 
polynomial of degree at most l. □ 

Exercise 16.7. In the field E in Example 16.16, what is the minimal polynomial 
of 1011 over Z 2 ? 

Exercise 16.8. Let p : E — »• E' be an E-algebra homomorphism, let a g E, let 
4> be the minimal polynomial of a over F, and let (f> be the minimal polynomial of 
p{a) over F. Show that <p' \ tp, and that <p' = <p if P is injective. 

Exercise 16.9. Show that if the factorization of / over E[.X] into monic irre- 
ducibles is / = /[* • • • f,. r , and if a = [h]f € F[X]/(f), then the minimal polyno- 
mial (p of a over F is lcm(</>i (p r ), where each 4>j is the minimal polynomial of 

\h\ f ‘‘ e F[X]/(fi‘) over F. 


16.6 General properties of extension fields 

We now discuss a few general notions related to extension fields. These arc all quite 
simple applications of the theory developed so far. Recall that if F and E arc fields, 
with F being a subring of E, then F is called a subfield of E, and E is called an 
extension field of F. As usual, we shall blur the distinction between a subring and 
a natural embedding; that is, if r : F -» E is a natural embedding, we shall simply 
identify elements of F with their images in E under r, and in so doing, we may 
view E as an extension field of F. Usually, the map r will be clear from context; 
for example, if E = F[X]/(f ) for some irreducible polynomial / g F \ X ], then 
we shall simply say that E is an extension field of F, although strictly speaking, F 
is embedded in E via the map that sends c e F to [c]/ e E. 

We start with some definitions. Let E be an extension field of a field F. Then E 
is an E-algebra via inclusion, and in particular, an E- vector space. If E is a finite 
dimensional E-vector space, then we say that E is a finite extension of E, and 
dinif (E) is called the degree (over E) of the extension, and is denoted (E : E); 
otherwise, we say that E is an infinite extension of E. 
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An element a e E is called algebraic over F if there exists a non-zero polyno- 
mial g e L^X] such that g(a) = 0, and in this case, we define the degree of a (over 
F) to be the degree of its minimal polynomial over F (see §16.5); otherwise, a is 
called transcendental over F. If all elements of E arc algebraic over F, then we 
call E an algebraic extension of F. 

Suppose E is an extension field of a field F. For a e E, we define 

F(a) := {g(a)/h(a) :g,he F[X], h(a) 0}. 

It is easy to see that F(a ) is a subfield of E, and indeed, it is the smallest subfield 
of E containing F and a. Clearly, the ring L[a] = [g(ct) : g e L[X]}, which is the 
smallest subring of E containing F and a, is a subring of F(a). We derive some 
basic properties of F(a ) and F\a\. The analysis naturally breaks down into two 
cases, depending on whether a is algebraic or transcendental over F. 

On the one hand, suppose a is algebraic over F. Let <p be the minimal polyno- 
mial of a over F, so that deg (<fi) > 0, and the quotient ring F\X\/(fi) is isomorphic 
(as an L-algebra) to the ring L[a] (see §16.5). Since F[a\ is a subring of a field, 
it must be an integral domain, which implies that F[X]/(tp) is an integral domain, 
and so tp is irreducible. This in turn implies that F[X]/(<fi) is a field, and so F[a\ is 
not just a subring of E , it is a subfield of E. Since Ffor] is itself already a subfield 
of E containing F and a, it follows that F(a ) = L[a]. Moreover, F[a\ is a finite 
extension of F; indeed (F\a\ : F) = deg (<p) = the degree of a over F, and the 
elements 1 ,a a m ~ l , where m := deg (</>), form a basis for F\a\ over F. 

On the other hand, suppose that a is transcendental over F. In this case, the 
minimal polynomial of a over F is the zero polynomial, and the ring F\a\ is iso- 
morphic (as an F-algebra) to the ring F\X\ (see §16.5), which is definitely not a 
field. But consider the “rational function evaluation map” that sends g/h e F(X ) 
to g(a)/h(a ) e F(a). Since no non-zero polynomial over F vanishes at a, it is 
easy to see that this map is well defined, and is in fact an F-algchra isomorphism. 
Thus, we see that F(a ) is isomorphic (as an F-algebra) to F(X). It is also clear 
that F(a ) is an infinite extension of F. 

Let us su mm arize the above discussion in the following theorem: 

Theorem 16.21. Let E be an extension field of a field F. 

(i) If a e E is algebraic over F, then F(a ) = F\a\, and F\ a\ is isomorphic 

(as an F-algebra) to F\X\/(fi), where 4> is the minimal polynomial of a 
over F, which is irreducible; moreover, F[a ] is a finite extension of F, 
and ( F\tx\ : F ) = dcg( r/j) = the degree of a over F, and the elements 
l, a or'" -1 , where m := deg (</>), form a basis for F\a\ over F. 

(ii) If a e E is transcendental over F, then F(a ) is isomorphic (as an F - 
algebra) to the rational function field F(X), while the subring F\n\ is 
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isomorphic (as an F -algebra) to the ring of polynomials F\X\: moreover, 
F(a ) is an infinite extension of F. 

Suppose E is an extension field of a field K, which itself is an extension of a 
field F. Then E is also an extension field of F. The following theorem examines 
the relation between the degrees of these extensions, in the case where E is a finite 
extension of K , and K is a finite extension of F. The proof is a simple calculation, 
which we leave to the reader to verify. 

Theorem 16.22. Suppose E is a finite extension of a field K, with a basis {/fy} J =1 
over K, and K is a finite extension of F, with a basis {a,}" =1 over F. Then the 
elements 

otiPj (i = j = 

form a basis for E over F. In particular, E is a finite extension of F and 

(E:F) = (E: K)(K : F ). 

Now suppose that £ is a finite extension of a field F. Let K be an intermediate 
field, that is, a subfield of E containing F. Then evidently, E is a finite extension of 
K (since any basis for E over F also spans E over K ), and K is a finite extension 
of F (since as F-vector spaces, K is a subspace of F). The previous theorem then 
implies that (F : F) = (F : K)(K : F). We have proved: 

Theorem 16.23. If E is a finite extension of a field F, and K is a subfield of E 
containing F, then E is a finite extension of K, K is a finite extension of F, and 
(F : F) = (F : K)(K : F). 

Again, suppose that F is a finite extension of a field F. Theorem 16.20 implies 
that F is algebraic over F, and indeed, that each element of F has degree over F 
bounded by (F : F). However, we can say a bit more about these degrees. Suppose 
a e E. Then the degree of a over F is equal to ( F\a\ : F), and by the previous 
theorem, applied to K := F[a], we have (F : F) = (F : F[a])(F[a] : F). In 
particular, the degree of a over F divides (F : F). We have proved: 

Theorem 16.24. If E is a finite extension of a field F, then it is an algebraic 
extension, and for each a e E, the degree of a over F divides (F : F). 

Example 16.19. Continuing with Example 16.17, we see that the real numbers V2 
and \fl arc algebraic over Q. The fields Q[V2] and Q\ \fl\ arc extension fields of 
Q, where (Q[V2] : Q) = 2 = the degree of V2 over Q, and (Q[-^2] : Q) = 3 = 
the degree of \[2 over ©. As both of these fields arc finite extensions of ©, they 
arc algebraic extensions as well. Since their degrees over 0 arc prime numbers, 
it follows that they have no subfields other than themselves and ■©. In particular, 
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if a e <Q>[V2] \ Q, then Q[a] = Q[V2]. Similarly, if a e Q[v^2] \ Q, then 

<Q>[«] = Q[^2]. □ 

Example 16.20. Continuing with Example 16.18, suppose / e F\X\ is a monic 
irreducible polynomial of degree l, so that E := F\X\/(f) = F\£,\, where 
| := [X]/ e E, is an extension field of F. The element | is algebraic of degree l 
over F. Moreover, E is a finite extension of F, with (E : F) = l\ in particular, E 
is an algebraic extension of F, and for each a e E. the degree of a over F divides 
l. □ 

As we have seen in Example 16.14, an irreducible polynomial over a field may 
be reducible when viewed as a polynomial over an extension field. A splitting 
field is a finite extension of the coefficient field in which a given polynomial splits 
completely into linear factors. As the next theorem shows, splitting fields always 
exist. 

Theorem 16.25. Let F be a field, and f e F\X\ a non-zero polynomial of degree 
n. Then there exists a finite extension E of F over which f factors as 

f = c(X — ai )(X - a 2 ) ■■■ (X - a„), 

where c e F and ai , . . . , a„ e E. 

Proof. We may assume that / is monic. We prove the existence of E by induction 
on the degree n of /. If n = 0, then the theorem is trivially true. Otherwise, let h be 
an irreducible factor of /, and set K := F[X]/(h), so that | := [X]/, e K is a root 
of h, and hence of /. So over K, which is a finite extension of F, the polynomial 
/ factors as 

/ = (X - Os, 

where g e K\X\ is a monic polynomial of degree n — 1. Applying the induction 
hypothesis, there exists a finite extension £ of LI over which g splits into linear 
factors. Thus, over E, f splits into linear factors, and by Theorem 16.22, E is a 
finite extension of F. □ 

Exercise 16.10. In the field E in Example 16.16, find all the elements of degree 
2 over Z 2 . 

Exercise 16.11. Let E be an extension field of a field F, and let ai, . . . , a n e E 

be algebraic over F. Show that the ring F\a\ a„] (see Example 7.45) is in fact 

a field, and that F[a\, . . . , a n \ is a finite (and hence algebraic) extension of F. 
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Exercise 16.12. Consider the real numbers V2 and s/2. Show that 

(Q[V2, $ 2 ] : Q) = (Q[V2 + $ 2 ] : Q) = 6. 

Exercise 16.13. Consider the real numbers V2 and V3. Show that 

(Q[V2, V3] : Q) = (Q[V2 + V3] : Q) = 4. 

Exercise 16.14. Show that if E is an algebraic extension of K, and K is an 
algebraic extension of F, then E is an algebraic extension of F. 

Exercise 16.15. Let E be an extension of F. Show that the set of all elements 
of E that are algebraic over F is a subfield of E containing F. 

Exercise 16.16. Consider a field F and its field of rational functions F(X). Let 
a e F(X) \ F. Show that X is algebraic over F(a), and that a is transcendental 
over F. 

Exercise 16.17. Let E be an extension field of a field F. Suppose a e E is 
transcendental over F, and that E is algebraic over F(a). Show that for every 
/? e E, p is transcendental over F if and only if E is algebraic over F( /]). 


16.7 Formal derivatives 

Throughout this section, R denotes a ring. 

Consider a polynomial g e R[X]. If Y is another indeterminate, we may evaluate 
g at X + Y, and collecting monomials of like degree in 7, we may write 

g(X+ Y) = g 0 + gl Y + g 2 Y 2 + --- (16.5) 

where g, e R\X ] for i = 0, 1,2 Evidently, go = g (j ust substitute 0 for Y in 

(16.5)), and we may write 

g(X + y)=g + g 1 y (mod Y 2 ). (16.6) 

We define the formal derivative of g, denoted D(g), to be the unique polyno- 
mial gi e .R[X] satisfying (16.6). We stress that unlike the “analytical” notion 
of derivative from calculus, which is defined in terms of limits, this definition is 
purely “symbolic.” Nevertheless, some of the usual rules for derivatives still hold: 

Theorem 16.26. We have: 

(i) D(c) = 0 for all c e R; 

(ii) D(X) = 1; 

(ill) D(g + h) = D(g) + D (h) for all g,he R\X\: 

(iv) D(g/i) = D (g)h + gD(h) for all g, h e R[X]. 
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Proof. Parts (i) and (ii) are immediate from the definition. Parts (iii) and (iv) 
follow from the definition by a simple calculation. Suppose 

g(X + Y) = g + giY (mod Y 2 ) and h(X + Y)=h + hi Y (mod Y 2 ) 
where g\ — D(g) and h\ = D(/r). Then 

(g + h)( X + Y ) = g( X + Y ) + h( X + Y ) = (g + h) + (gi + h x )Y (mod 7 2 ), 
and 

(gh){ X + Y)=g(X + Y)h(X + Y) = gh + ( gl h + ghi)Y (mod T 2 ). □ 

Combining parts (i) and (iv) of this theorem, we see that D(cg) = cD(g) for 
all c e R and g e R[X], This fact can also be easily derived directly from the 
definition of the derivative. 

Combining parts (ii) and (iv) of this theorem, together with a simple induction 
argument, we see that D(X") = nX n ~ l for all positive integers n. This fact can also 
be easily derived directly from the definition of the derivative by considering the 
binomial expansion of (X + Y) n . 

Combining paid (iii) of this theorem and the observations in the previous two 
paragraphs, we see that for any polynomial g = Xf=o a i^‘ e we have 

k 

D (g) = ^ia i X i -\ (16.7) 

i=i 

which agrees with the usual formula for the derivative of a polynomial. 

The notion of a formal derivative can be generalized to multi-variate polynomi- 
als. Let g 6 R[X i, . . . , X„]. For any i = 1, . . . , n, we can view g as a polynomial in 

the variable X h whose coefficients are elements of . . . , X i+ \, X„], 

Then if we formally differentiate with respect to the variable X h we obtain the 
formal “partial” derivative Dx,(g). 


Exercise 16. 18. Show that for g\ g„ e i?[X], we have 


D (ns<) = 2 D(si) iiv 

i i j# 


and that for g e jR[X], and n > 1, we have 

D(g") = ng' ,_1 D(g). 

Exercise 16.19. Prove the “chain rule” for formal derivatives: ii g,h e P|X | 
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and / = g(h) e i?[X], then D(/) = D (g)(/r) ■ D (/z); more generally, if g e 
X„], and h \, ..., h n e J?[X], and / = g{h\, .... h n ) e i?[X], then 

n 

D x(f) = 2 J Vx i (g)(hu...,h n )D x (h i ). 

1=1 

Exercise 16.20. Let g e J?[X], and let x e R be a root of g. Show that x is a 
multiple root of g if and only if x is also a root of D(g) (see Exercise 7.18). 

Exercise 16.21. Let g e i?[X] with deg(g) = k > 0, and let x e R. Show that 
if we evaluate g at X + x, writing 

k 

g{X + x) = Y J biX i , 

1=0 

with bo, . . . ,bk e R, then we have 

/! • bj = (D'(g))(x) for / = 0, . . . , k. 

Exercise 16.22. Suppose p is a prime, g e Z[X], and x e Z, such that 
g(x) = 0 (mod p) and D(g)(x) ^ 0 (mod p). Show that for every positive integer 
e, there exists an integer x such that g(x) = 0 (mod p e ), and give an efficient 
procedure to compute such an x, given p, g, x, and e. Hint: mimic the “lifting” 
procedure discussed in §12.5.2. 


16.8 Formal power series and Laurent series 

We discuss generalizations of polynomials that allow an infinite number of non- 
zero coefficients. Although we arc mainly interested in the case where the coeffi- 
cients come from a field F, we develop the basic theory for general rings R. 


16.8.1 Formal power series 

The ring i?[[XJ of formal power series over R consists of all formal expressions 
of the form 

g = no T 0[X + aiX~ + • • • , 

where ao, a\,a 2 ,... e R ■ Unlike ordinary polynomials, we allow an infinite num- 
ber of non-zero coefficients. We may write such a formal power series as 

00 

g = Yj a > x ‘- 
1=0 
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Formally, such a formal power series is an infinite sequence { a, } and the rules 
for addition and multiplication arc exactly the same as for polynomials. Indeed, 
the formulas (7.2) and (7.3) in §7.2 for addition and multiplication may be applied 
directly — all of the relevant sums arc finite, and so everything is well defined. 
We leave it to the reader to verify that with addition and multiplication so defined, 
R [[ X || indeed forms a ring. We shall not attempt to interpret a formal power series 
as a function, and therefore, “convergence” issues shall simply not arise. 

Clearly, /^ [[ X || contains R\X ] as a subring. Let us consider the group of units of 

# Pi- 

Theorem 16.27. Let g = YjT=o a iX' £ 7?P|. Then g e (i?PJ)* if and only if 
ao e R*. 

Proof. If no is not a unit, then it is clear that g is not a unit, since the constant term 
of a product of formal power series is equal to the product of the constant terms. 

Conversely, if no is a unit, we show how to define the coefficients of the inverse 
h = 2“o b,X' of g. Let / = gh = XmdP*- We wan t / = 1, which means that 
Co = 1 and c, = 0 for all i > 0. Now, co = tfo^o, so we set bo := . Next, we have 

ci = aob\+a\bo, so we set b\ := — a\bo-a^ 1 . Next, we have ci = aob 2 +a\b\ +« 2 ^o- 
so we set b-± := —{a\b\ + aibf) ■ af l . Continuing in this way, we see that if we 
define b t := — (ai&;_i + • • • + at bo) ■ n^ 1 for i > 1, then gh = 1. □ 

Example 16.21. In the ring R\\ X ||, the multiplicative inverse of 1 — X is X/lo n 

Exercise 16.23. Let F be a field. Show that every non-zero ideal of F[I1 is of 
the form (X m ) for some uniquely determined integer m > 0. 


16.8.2 Formal Laurent series 

One may generalize formal power series to allow a finite number of negative pow- 
ers of X. The ring R((X)) of formal Laurent series over R consists of all formal 
expressions of the form 

g = a m X m + a m+ \X m+] + ■ ■ ■ , 

where m is allowed to be any integer (possibly negative), and a m , a m+ \ , . . . e R. 
Thus, elements of R((X)) may have an infinite number of terms involving positive 
powers of X, but only a finite number of terms involving negative powers of X. We 
may write such a formal Laurent series as 

00 

g = £ a,X‘. 

i=m 
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Formally, such a formal Laurent series is a doubly infinite sequence {a ( }“_ 00 , 
with the restriction that for some integer m, we have a, = 0 for all i < m. We may 
again use the usual formulas (7.2) and (7.3) to define addition and multiplication 
(where the indices i, j, and k now range over all integers, not just the non-negative 
integers). Note that while the sum in (7.3) has an infinite number of terms, only 
finitely many of them arc non-zero. 

One may naturally view T[[X]] as a subring of R((X)), and of course, T[X] is a 
subring of T[[X]] and so also a subring of R((X)). 

Theorem 16.28. If D is an integral domain, then D((X)) is an integral domain. 

Proof. Let g = Yjf= m CI ‘ X ‘ an< ^ ^ = YjT=n ^ iX '' where a m f 0 and b n f 0. Then 
gh = 'ZZm+n C ‘ X ‘’ where C '»+« = a mK ± 0. □ 

Theorem 16.29. Let g e R((X)), and suppose that g f 0 and g = cijX‘ with 
a m e R* . Then g has a multiplicative inverse in R((X)). 

Proof. We can write g = X m g ' , where g' is a formal power series whose constant 
term is a unit, and hence there is a formal power series h such that g'h = 1. Thus, 
X~ m h is the multiplicative inverse of £ in R((X)). □ 

As an immediate corollary, we have: 

Theorem 16.30. If F is a held, then F((X)) is a held. 


Exercise 16.24. Let F be a field. Show that T((X)) is the field of fractions of 
TUX]]; that is, there is no subfield E C T((X)) that contains T[[X]]. 


16.8.3 Reversed Laurent series 

While formal Laurent series are useful in some situations, in many others, it is 
more useful and natural to consider reversed Laurent series over R. These are 
formal expressions of the form 

m 

s = 2 a ‘ xi ’ 

i =— oo 

where a m , a m -\, . . . e R. Thus, in a reversed Laurent series, we allow an infinite 
number of terms involving negative powers of X, but only a finite number of terms 
involving positive powers of X. Formally, such a reversed Laurent series is a doubly 
infinite sequence {a, }“_ 00 , with the restriction that for some integer m, we have 
a, = 0 for all i > m. We may again use the usual formulas (7.2) and (7.3) to define 
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addition and multiplication — and again, the sum in (7.3) has only finitely many 
non- zero terms. 

The ring of all reversed Laurent series is denoted and as the notation 

suggests, the map that sends X to X~ l (and acts as the identity on R) is an R- 
algebra isomorphism of RiiX)) with R((X^ 1 )). Also, one may naturally view i?[X] 
as a subring of R((X~ 1 )). 

For g = 2/l-oo a ‘ X ‘ e R((X~ l )) with a m 0, let us define the degree of g, 
denoted deg(g), to be the value m, and the leading coefficient of g, denoted lc(g), 
to be the value a m . As for ordinary polynomials, we define the degree of 0 to 
be -oo, and the leading coefficient of 0 to be 0. Note that if g happens to be a 
polynomial, then these definitions of degree and leading coefficient agree with that 
for ordinary polynomials. 

Theorem 16.31. For g, h e i?((X -1 )), we have deg (gh) < deg(g) + deg (h), where 
equality holds unless both lc(g) and \c(h) are zero divisors. Furthermore, if h f 0 
and lc (h) is a unit, then h is a unit, and we have deg{gh~ l ) = deg(g) - deg (h). 

Proof. Exercise. □ 

It is also natural to define a floor function for reversed Laurent series: for 
g e R((X~ 1 )) with g = 2/l_oo a i X '’ we define 

m 

LgJ := 2 atX 1 6 R[X]; 

/=o 

that is, we compute the floor function by simply throwing away all terms involving 
negative powers of X. 

Theorem 16.32. Let g,h e i?[X] with h 0 and \c(h) e R*, and using the 
usual division with remainder property for polynomials, write g = hq + r, where 
q, r e i?[X] with deg(r) < dcg(/i). Let hT { denote the multiplicative inverse of h 
in R([X- 1 )). Then q = [gh~ l \. 

Proof. Multiplying the equation g = hq + r by h~ l , we obtain gh~ l = q + rh~ l , 
and deg (rh~ l ) < 0, from which it follows that \_gh~ { \ = q. □ 

Let F be a field, so that F((X~ 1 )) is also field (this is immediate from Theo- 
rem 16.31). Now, F(()f -1 )) contains L[X] as a subring, and hence contains (an 
isomorphic copy of) the rational function field F(X). Just as F(X) corresponds 
to the field of rational numbers, F((X~ 1 )) corresponds to the field real numbers. 
Indeed, we can think of real numbers as decimal numbers with a finite number 
of digits to the left of the decimal point and an infinite number to the right, and 
reversed Laurent series have a similar “syntactic” structure. In many ways, this 
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syntactic similarity between the real numbers and reversed Laurent series is more 
than just superficial. 


Exercise 16.25. Write down the rule for determining the multiplicative inverse 
of an element of i?((X -1 )) whose leading coefficient is a unit in R. 

Exercise 16.26. Let F be a field of characteristic other than 2. Show that a 
non-zero g e has a square -root in F((X~ 1 )) if and only if deg(g) is even 

and lc(g) has a square -root in F. 

Exercise 16.27. Let R be a ring, and let a e R. Show that the multiplicative 
inverse of X - a in R((X~ 1 )) is Xyli a 7-1 

Exercise 16.28. Let R be an arbitrary ring, let a\ at e R, and let 

f:={X- ai)(X - fl2 ) ••• (X - a e ) e R[X]. 

For j > 0, define the “power sum” 


e 



i= 1 


Show that in the ring R((X 1 )), we have 


D if) = y 1 

/ h a ~ a>) 

where D (/) is the formal derivative of /. 



Exercise 16.29. Continuing with the previous exercise, derive Newton’s iden- 
tities, which state that if f = X f + c\X f ~ l + ■ ■ ■ + q, with ci,...,q e R, then 


si + ci = 0 

S2 + Cl Si + 2 c 2 = 0 

S3 + ci s 2 + c 2 si + 3 c3 = 0 


s e + ciSf_i + • • • + Cf-iSi + ice = 0 
Sj+t + CiSj+e-i + ■ ■ ■ + C(-\Sj+\ + c ( Sj = 0 O' > 1). 
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16.9 Unique factorization domains (*) 

As we have seen, both the ring of integers and the ring of polynomials over a field 
enjoy a unique factorization property. These arc special cases of a more general 
phenomenon, which we explore here. 

Throughout this section, D denotes an integral domain. 

We call a, b e D associate if a = ub for some u e D*. Equivalently, a and b arc 
associate if and only if a \ b and b \ a (see paid (i) of Theorem 7.4). A non-zero 
element p e D is called irreducible if it is not a unit, and all divisors of p arc 
associate to 1 or p. Equivalently, a non-zero, non-unit p e D is irreducible if and 
only if it cannot be expressed as p = ab where neither a nor b arc units. 

Definition 16.33. We call D a unique factorization domain (UFD) if 

(i) every non-zero element of D that is not a unit can be written as a product 
of irreducibles in D, and 

(ii) such a factorization into irreducibles is unique up to associates and the order 
in which the factors appear. 

Another way to state paid (ii) of the above definition is that if pi ■ ■ ■ p r and 
p\ - ■ ■ p' s arc two factorizations of some element as a product of irreducibles, then 
r = s, and there exists a permutation jc on the indices { 1, . . . , r) such that p, and 
p' ... arc associate. 

As we have seen, both Z and F[A] are UFDs. In both of those cases, we chose 
to single out a distinguished irreducible element among all those associate to any 
given irreducible: for Z, we always chose positive primes, and for EfX], we chose 
monic irreducible polynomials. For any specific unique factorization domain D. 
there may be such a natural choice, but in the general case, there will not be (but 
see Exercise 16.30 below). 

Example 16.22. Having already seen two examples of UFDs, it is perhaps a good 
idea to look at an example of an integral domain that is not a UFD. Consider the 
subring Z[V—3] of the complex numbers, which consists of all complex numbers 
of the form a + bV—'. 3, where a. b e Z. As this is a subring of the field C, it is an 
integral domain (one may also view Z[V-3] as the quotient ring r L\X\/(X 2 + 3)). 

Fet us first determine the units in Z[V—3\. For a,be Z, we have N ( a+bV—3 ) = 
a 2 + 3b 2 , where N is the usual norm map on C (see Example 7.5). If a e Z[ \/— 3 ] 
is a unit, then there exists a ' e Z[V— 3] such that aa' = 1. Taking norms, we obtain 

1 = TV ( 1) = N(aa') = N(a)N(a'). 

Since the norm of an element of Z[ V^3] is a non-negative integer, this implies that 
N(a ) = 1. If a = a + bV^ 3, with a, b e Z, then N(a) = a 2 + 3b 2 , and it is clear 
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that N(a) = 1 if and only if a = ±1. We conclude that the only units in Z[V^3] 
are ± 1 . 

Now consider the following two factorizations of 4 in Z[V^ 3]: 

4 = 2 ■ 2 = (1 + V— 3)(1 - V— 3). (16.8) 

We claim that 2 is irreducible. For suppose, say, that 2 = aa' , for a , a' e Z[ V— 3 1, 
with neither a unit. Taking norms, we have 4 = AT (2) = N(a)N(a'), and therefore, 
N (a) = N (a') = 2 — but this is impossible, since there are no integers a and b such 
that a 2 + 3 b 2 = 2. By the same reasoning, since N( 1 + ^/ = 3) = N( 1 - V^-3) = 4, 
we see that 1 + and 1 - V—3 are both irreducible. Further, it is clear that 2 is 
not associate to either 1 + V— 3 or 1 - V— 3, and so the two factorizations of 4 in 
(16.8) arc fundamentally different. □ 

For a,beD, we call d e D a common divisor of a and b if d \ a and d \ b\ 
moreover, we call such a d a greatest common divisor of a and b if all other 
common divisors of a and b divide d. We say that a and b are relatively prime if 
the only common divisors of a and b are units. It is immediate from the definition 
of a greatest common divisor that it is unique, up to multiplication by units, if 
it exists at all. Unlike in the case of Z and F’fX], in the general setting, greatest 
common divisors need not exist; moreover, even when they do, we shall not attempt 
to “norm a li z e” greatest common divisors, and we shall speak only of “a” greatest 
common divisor, rather than “the” greatest common divisor. 

Just as for integers and polynomials, we can generalize the notion of a greatest 
common divisor in an arbitrary integral domain D from two to any number of 
elements of D, and we can also define a least common multiple of any number of 
elements as well. 

Although these greatest common divisors and least common multiples need not 
exist in an arbitrary integral domain D, if D is a UFD, they will always exist. 
The existence question easily reduces to the question of the existence of a greatest 
common divisor and least common multiple of a and b, where a and b are non-zero 
elements of D. So assuming that D is a UFD, we may write 

r r 

a = u pf and b = v p{‘ , 

7 = 1 7 = 1 

where u and v arc units, p\,...,p r arc non-associate irreducibles, and e\ e r 

and arc non-negative integers, and it is easily seen that 

r 

p m,n(e " fd 

7 = 1 
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is a greatest common divisor of a and b, while 


J - | ynax(e,,/,) 
i = 1 


is a least common multiple of a and b. 

It is also evident that in a UFD D, if c \ ab and c and a are relatively prime, 
then c | b. In particular, if p is irreducible and p \ ab, then p \ a or p \ b. This 
is equivalent to saying that if p is irreducible, then the quotient ring D/pD is an 
integral domain (and the ideal pD is a prime ideal — see Exercise 7.38). The 
converse also holds: 


Theorem 16.34. Suppose D satisfies part (i) of Definition 16.33, and that D/pD 
is an integral domain for every irreducible p e D. Then D is a UFD. 

Proof. Exercise. □ 


Exercise 16.30. (a) Show that the “is associate to” relation is an equivalence 

relation. 

(b) Consider an equivalence class C induced by the “is associate to” relation. 
Show that if C contains an irreducible element, then all elements of C are 
irreducible. 

(c) Suppose that for every equivalence class C that contains irreducibles, we 
choose one element of C, and call it a distinguished irreducible. Show that 
D is a UFD if and only if every non-zero element of D can be expressed as 

u p\' • • • P?\ where u is a unit, p\ p r are distinguished irreducibles, and 

this expression is unique up to a reordering of the pf s. 

Exercise 16.31. Show that the ring Z[V—5] is not a UFD. 

Exercise 16.32. Let D be a UFD and F its field of fractions. Show that 

(a) every element x e F can be expressed as x = a/b, where a,beD are 
relatively prime, and 

(b) that if x = a/b for a. b e D relatively prime, then for any other a’, b’ e D 
with x = a! /b' , we have a' = ca and b' = cb for some c e D. 

Exercise 16.33. Let D be a UFD and let p e D be irreducible. Show that there 

is no prime ideal Q of D with { } C. Q C pD (see Exercise 7.38). 
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16.9.1 Unique factorization in Euclidean and principal ideal domains 
Our proofs of the unique factorization property in both Z and F\X\ hinged on the 
division with remainder property for these rings. This notion can be generalized, 
as follows. 

Definition 16.35. We say D is a Euclidean domain if there is a “size function” S 
mapping the non-zero elements of D to the set of non-negative integers, such that 
for all a,b E D with h f 0. there exist q, r E D, with the property that a = bq + r 
and either r = 0 or S(r ) < S(b). 

Example 16.23. Both Z and TfJf] are Euclidean domains. In Z, we can take the 
ordinary absolute value function |-| as a size function, and for F\X\, the function 
deg(-) will do. □ 

Example 16.24. Recall again the ring 

Z [/'] = [a + bi : a, b E Z} 

of Gaussian integers from Example 7.25. Let us show that this is a Euclidean 
domain, using the usual norm map N on complex numbers (see Example 7.5) for 
the size function. Let a,(l E Z [/'], with /? f 0. We want to show the existence 
of K,p E Z[ / 1 such that a = (he + p, where N{p) < N((l). Suppose that in the 
field C, we compute a/? -1 = r + si, where r,s E Q. Let m, n be integers such that 
\m — r\ < 1/2 and \n — s\ < 1 /2 — such integers m and n always exist, but may not 
be uniquely determined. Set k := m + ni E Z[z] and p := a - (Ik. Then we have 

a(~ l = k + 6, 

where S e C with N(S) < 1/4 + 1/4 = 1 /2, and 

p = a — pK = a — /?(a/? _1 - S) = 5(1, 

and hence 

Nip) = N(8P) = N(6)N(p) < l -N(p). □ 

Theorem 16.36. If D is a Euclidean domain and I is an ideal of D, then there 
exists d E D such that I = dD. 

Proof. If I = {0}, then d = 0 does the job, so let us assume that I f {0}. Let d 
be any non-zero element of I such that S(d) is minimal, where S' is a size function 
that makes D into a Euclidean domain. We claim that I = dD. 

It will suffice to show that for all c E I, we have d \ c. Now, we know that 
there exists q,r E D such that c = dq + r, where either r = 0 or S{r) < S(d). 
If r = 0, we are done; otherwise, r is a non-zero element of I with S(r) < S{d), 
contradicting the minimality of S{d). □ 
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Recall that an ideal of the form I = dD is called a principal ideal. If all ideals of 
D are principal, then D is called a principal ideal domain (PID). Theorem 16.36 
says that every Euclidean domain is a PID. 

PIDs enjoy many nice properties, including: 

Theorem 16.37. If D is a PID, then D is a UFD. 

For the rings Z and F\ X |, the proof of paid (i) of Definition 16.33 was a quite 
straightforward induction argument (as it also would be for any Euclidean domain). 
For a general PID, however, this requires a different sort of argument. We begin 
with the following fact: 

Theorem 16.38. If D is a PID, and I\ C F C • • • are ideals of D, then there 
exists an integer k such that I k = I k+ \ = • • • . 

Proof. Let I := (J“j which is an ideal of D (see Exercise 7.37). Thus, I = dD 
for some d e D. But d € (J“j /, implies that d e I k for some k, which shows that 
I = dD C I k . It follows that I = I k = I k+l = • • • . □ 

We can now prove the existence part of Theorem 16.37: 

Theorem 16.39. If D is a PID, then every non-zero, non-unit element of D can 
be expressed as a product of iireducibles in D. 

Proof. Let c e D. c f 0. and c not a unit. If c is irreducible, we arc done. 
Otherwise, we can write c = ab, where neither a nor b are units. As ideals, we 
have cD C a D and cD C bD. If we continue this process recursively, building up 
a “factorization tree” where c is at the root, a and b arc the children of c, and so on, 
then the recursion must stop, since any infinite path in the tree would give rise to 
ideals 

cD = Ii C I 2 C • • • , 
contradicting Theorem 16.38. □ 

The proof of the uniqueness paid of Theorem 16.37 is essentially the same as for 
proofs we gave for Z and F\ X ]. 

Analogous to Theorems 1.7 and 16.13, we have: 

Theorem 16.40. Let D be a PID. For all a,beD, there exists a greatest common 
divisor d of a and b, and moreover, aD + bD = dD. 

Proof. Exercise. □ 

As an immediate consequence of the previous theorem, we see that in a PID D, 
for all a,beD with greatest common divisor d, there exist s,t e D such that 
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as + bt = cl: moreover, a,beD arc relatively prime if and only if there exist 
s,t e D such that as + bt = 1. 

Analogous to Theorems 1.9 and 16.14, we have: 

Theorem 16.41. Let D be a PID. For all a,b,c e D such that c \ ab and a and c 
are relatively prime, we have c \ b. 

Proof. Exercise. □ 

Analogous to Theorems 1.10 and 16. 15, we have: 

Theorem 16.42. Let D be a PID. Let p e D be irreducible, and let a,beD. Then 
p | ab implies that p \ a or p \ b. 

Proof. Exercise. □ 

Theorem 16.37 now follows immediately from Theorems 16.39, 16.42, and 
16.34. 

Exercise 16.34. Show that Z[V^2] is a Euclidean domain. 

Exercise 16.35. Consider the polynomial 

A 3 - 1 = (A - 1)(A 2 + A + 1). 

Over C, the roots of A 3 — 1 are 1, (—1 ± \T^3)/2. Let co := (—1 + v^3^) /2, and 
note that or = -1 - co = (-1 - V^-3)/2, and of = I . 

(a) Show that the ring Z[a>] consists of all elements of the form a + bco, where 
a, b e Z, and is an integral domain. This ring is called the ring of Eisenstein 
integers. 

(b) Show that the only units in Z [co] are ±1, ±m, and ±of. 

(c) Show that Z[o>] is a Euclidean domain. 

Exercise 16.36. Show that in a PID, all non- zero prime ideals are maximal (see 
Exercise 7.38). 

Recall that for a complex number a = a + bi, with a,b e M, the norm of 
a was defined as N(a) = aa = a 2 + b 2 (see Example 7.5). There are other 
measures of the “size” of a complex number that are useful. The absolute value 
of a is defined as |a| := - \/N(a ) = y/ a 2 + b 2 . The max norm of a is defined as 
M(a ) := max{ |n|, \b\ }. 

Exercise 16.37. Let a, p e C. Prove the following statements: 

(a) \af \ = \a\\P\: 
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(b) \a + P\ < |a| + |/?|; 

(c) N(a + p)< 2(N(a) + N(p)); 

(d) M(a) < \a\ < V2M(a). 

The following exercises develop algorithms for computing with Gaussian inte- 
gers. For computational purposes, we assume that a Gaussian integer a = a + bi, 
with a,be Z, is represented as the pair of integers (a, b). 

Exercise 16.38. Let a, /? e Z[i]. 

(a) Show how to compute Mfa) in time Oflen(Mfa))) and N{a) in time 
OflenfMfa)) 2 ). 

(b) Show how to compute a + ft in time Oflen(Mfa)) + lenfMf/?))). 

(c) Show how to compute a ■ f in time Oflen(Mfa)) • len (M(/?))). 

(d) Assuming ft ^ 0, show how to compute k, p e Z[/'] such that a = fie + p, 
N(p) < ^N(P), and N(k) < 4 N{a)/N(f). Your algorithm should run 
in time 0(len(M(a)) • len(M(^))). Hint: see Example 16.24; also, to 
achieve the stated running time bound, your algorithm should first test if 
M(f) > 2 M(a). 

Exercise 16.39. Using the division with remainder algorithm from paid (d) 
of the previous exercise, adapt the Euclidean algorithm for (ordinary) integers 
to work with Gaussian integers. On inputs a, ft e Z [*'], your algorithm should 
compute a greatest common divisor 5 e Z [i] of a and /( in time 0(£ 3 ), where 
£ := max{len(M(a)),len(M(/?))}. 

Exercise 16.40. Extend the algorithm of the previous exercise, so that it com- 
putes cr, t e Z [ i | such that aa + fir = 5. Your algorithm should run in time Off 3 ), 
and it should also be the case that lenfMfo - )) and lenfMfr)) are Off). 

The algorithms in the previous two exercises for computing greatest common 
divisors in Z [/] run in time cubic in the length of their input, whereas the corre- 
sponding algorithms for Z run in time quadratic in the length of their input. This is 
essentially because the running time of the algorithm for division with remainder 
discussed in Exercise 16.38 is insensitive to the size of the quotient. 

To get a quadratic-time algorithm for computing greatest common divisors in 
Z [/], in the following exercises we shall develop an analog of the binary gcd algo- 
rithm for Z. 

Exercise 16.41. Let n := 1 + / e Z [/']. 

(a) Show that 2 = nn = —in 2 , that N(n) = 2, and that n is irreducible in Z [;']. 
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(b) Let a e Z [/], with a = a + bi for a,be Z. Show that jz \ a if and only if 
a — b is even, in which case 

a a + b b — a _ 


(c) Show that for all a e Z we have a = 0 (mod n ) or a = 1 (mod n). 

(d) Show that the quotient ring Z [ / ] / tz: Z [ / 1 is isomorphic to the ring 7Li- 

(e) Show that for all a e Z [/'] with a = 1 (mod n), there exists a unique 
£ e {±1, ±/} such that a = s (mod 2 jt). 

(f) Show that for all a, /I e Z[/| with a = /? = 1 (mod 7r), there exists a unique 
£ £ {±1, ±/} such that a = £/? (mod 2#). 

Exercise 16.42. We now present a “(1 + /)-ary gcd algorithm” for Gaussian 
integers. Let n := 1 + i e Z[/']. The algorithm takes non-zero a, /? e Z [/'] as input, 
and runs as follows: 

p <- a, p' <— /?, e <- 0 

while ^ | p and ^ | p' do p <- p/ jt, p’ <- p’ In, e e + 1 
repeat 

while n \ p do p ^ p/n 

while jz | p' do p' p' / n 

if M(p') < M(p) then (p, p') (p', p) 

determine £ € {±l,±z} such that p' = ep (mod 2n) 

(*) P' ^ p' ~ £P 

until p' = 0 
d <— • p 

output <5 

Show that this algorithm correctly computes a greatest common divisor of a 
and ji, and that it can be implemented so as to run in time 0(£ 2 ), where l := 
max(len(M(a)), len(M(/?))). Hint: to analyze the running time, for i = 1,2, ... , 
let V, (respectively, v') denote the value of |pp'[ just before (respectively, after) the 
execution of the line marked (*) in loop iteration i, and show that 

v' < (1 + V2)v i and v i+ \ < v'/2V2. 

Exercise 16.43. Extend the algorithm of the previous exercise, so that it com- 
putes cr, t e Z [/] such that aa + /3 t = S. Your algorithm should run in time 0(( 2 ), 
and it should also be the case that len(M(cr)) and len(M(r)) arc 0(1). Hint: adapt 
the algorithm in Exercise 4.10. 

Exercise 16.44. In Exercise 16.41, we saw that 2 factors as — i(l + i) 2 in Z [/], 
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where 1 + /' is irreducible. This exercise examines the factorization in Z [/] of prime 
numbers p > 2. Show that: 

(a) for every irreducible n e Z [/], there exists a unique prime number p such 
that n divides p\ 

(b) for all prime numbers p = 1 (mod 4), we have p = nn, where n e Z [/'] 
is irreducible, and the complex conjugate n of n is also irreducible and not 
associate to n\ 

(c) all prime numbers p = 3 (mod 4) arc irreducible in Z [/']. 

Hint: for parts (b) and (c), use Theorem 2.34. 


16.9.2 Unique factorization in D\ X \ 

In this section, we prove the following: 

Theorem 16.43. If D is a UFD, then so is D\X\. 

This theorem implies, for example, that Z[X] is a UFD. Applying the theorem 

inductively, one also sees that Z[Xi, . . . , X n ] is a UFD, as is ,F[Xi X„] for every 

field F . 

We begin with some simple observations. First, recall that for an integral domain 
D , D[X] is an integral domain, and the units in D\X\ arc precisely the units in D. 
Second, it is easy to see that an element of D is irreducible in D if and only if it is 
irreducible in D[X], Third, for c e D and / = JV c,X ! e D[X], we have c | / if 
and only if c | c, for all i. 

We call a non-zero polynomial / e D\X \ primitive if the only elements of 
D that divide / are units. If D is a UFD, then given any non-zero polynomial 
/ e D\ X ], we can write it as / = cf, where c e D and f e D[X ] is a primitive 
polynomial: just take c to be a greatest common divisor of all the coefficients of /. 

Example 16.25. In Z[X], the polynomial / = 4X 2 + 6X + 20 is not primitive, but 
we can write / = 2/', where /' = 2X 2 + 3X + 10 is primitive. □ 

It is easy to prove the existence part of Theorem 16.43: 

Theorem 16.44. Let D be a UFD. Every non-zero, non-unit element of D\ X ] can 
be expressed as a product of irreducibles in D[X |. 

Proof. Let / be a non-zero, non-unit polynomial in D[X]. If / is a constant, then 
because D is a UFD, / factors into irreducibles in D. So assume / is not constant. 
If / is not primitive, we can write / = cf, where c is a non- zero, non-unit in D, 
and /' is a primitive, non-constant polynomial in D[X]. Again, as D is a UFD, c 
factors into irreducibles in D. 



460 


More rings 


From the above discussion, it suffices to prove the theorem for non-constant, 
primitive polynomials / e D\ X\. If / is itself' irreducible, we arc done. Otherwise, 
we can write / = gh, where g,h e D\ X ] and neither g nor h are units. Further, by 
the assumption that / is a primitive, non-constant polynomial, both g and h must 
also be primitive, non-constant polynomials; in particular, both g and h have degree 
strictly less than deg(/), and the theorem follows by induction on degree. □ 

The uniqueness paid of Theorem 16.43 is (as usual) more difficult. We begin 
with the following fact: 

Theorem 16.45. Let D be a UFD, let p be an irreducible in D, and let g, h e D\X\ . 
Then p \ gh implies p \ g or p\h. 

Proof. Consider the quotient ring D/pD , which is an integral domain (because 
D is a UFD), and the corresponding ring of polynomials ( D/pD)\X ], which is 
also an integral domain. Also consider the natural map that sends a e D to 
a := \a\ p e D/pD , which we can extend coefficient-wise to a ring homomorphism 
from D[X] to (D / pD)[X] (see Example 7.46). If p \ gh. then we have 

0 = gh = gh, 

and since (D / pD)[X] is an integral domain, it follows that g = 0 or h = 0, which 
means that p \ g or p \ h. □ 

Theorem 16.46. Let D be a UFD. The product of two primitive polynomials in 
D\ X | is also primitive. 

Proof. Let g,h e D\ X ] be primitive polynomials, and let / := gh. If / is not 
primitive, then c \ f for some non-zero, non-unit c e D, and as D is a UFD, there 
is some irreducible element p e D that divides c, and therefore, divides / as well. 
By Theorem 16.45, it follows that p \ g or p \ h, which implies that either g is not 
primitive or h is not primitive. □ 

Suppose that D is a UFD and that F is its field of fractions. Any non-zero 
polynomial / e L[A] can always be written as / = ( c/d)f , where c,d e D , 
with d f 0, and f e D\ X ] is primitive. To see this, clear the denominators of the 
coefficients of /, writing df = /", where 0 f d e L) and /" e D\X\. Then take c 
to be a greatest common divisor of the coefficients of /", so that /" = cf , where 
f 6 D[X] is primitive. Then we have / = {c/d)f , as required. Of course, we 
may assume that c and d are relatively prime — if not, we may divide c and d by a 
greatest common divisor. 

Example 16.26. Let / = (3/5)X 2 + 9X + 3/2 e Q[X]. Then we can write 
/ = (3/10) /', where /' = 2X 2 + 30X + 5 e Z,[X\ is primitive. □ 
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As a consequence of the previous theorem, we have: 

Theorem 16.47. Let D be a UFD and let F be its field of fractions. Suppose that 
f,g e D\X\ and h e F\X\ are non-zero polynomials such that f = gh and g is 
primitive. Then h e D\X\. 

Proof. Write h = ( c/d)h ', where c,d e D and h' e D\ X ] is primitive. Let us 
assume that c and d are relatively prime. Then we have 

d ■ f = c ■ gh! . (16.9) 

We claim that d e D*. To see this, note that (16.9) implies that d | (c • gh'), 
and the assumption that c and d are relatively prime implies that d \ gh'. But by 
Theorem 16.46, gh' is primitive, from which it follows that d is a unit. That proves 
the claim. 

It follows that c/d e D, and hence h = ( c/d)h ’ e D[X]. □ 

Theorem 16.48. Let D be a UFD and F its held of fractions. If f e D[X] with 
deg (/) > 0 is irreducible, then f is also irreducible in W[ X ] . 

Proof. Suppose that / is not irreducible in T[X], so that / = gh for non-constant 
polynomials g,h e _F[X], both of degree strictly less than that of f. We may write 
g = (c/d)g' , where c,d e D and g' e D[X] is primitive. Set h' := ( c/d)h , so that 
/ = gh = g’h’ . By Theorem 16.47, we have h' e D[X], and this shows that / is 
not irreducible in D[X]. □ 

Theorem 16.49. Let D be a UFD. Let f e D[ X \ with deg (/) > 0 be irreducible, 
and let g,h e D\X\. If f divides gh in D[X], then f divides either g or h in 
D[X]. 

Proof. Suppose that / e D\X\ with deg(/) > 0 is irreducible. This implies that / 
is a primitive polynomial. By Theorem 16.48, / is irreducible in TfA], where F is 
the field of fractions of D. Suppose / divides gh in D[X]. Then because T’fX] is 
a UFD, / divides either g or h in F\X\. But Theorem 16.47 implies that / divides 
either g or h in D[X). □ 

Theorem 16.43 now follows immediately from Theorems 16.44, 16.45, and 
16.49, together with Theorem 16.34. 

In the proof of Theorem 16.43, there is a clear connection between factorization 
in D[X] and F\ X |, where F is the field of fractions of D. We should perhaps make 
this connection more explicit. Let / e D[X] be a non-zero polynomial. We may 
write / as 


p ci i cif pb i /*b c 

f = up f ---Pr f x ■■■/,*. 
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where u e D*, the pf s arc non-associate, irreducible elements of D, and the ffs 
arc non-associate, irreducible, non-constant polynomials over D (and in particular, 
primitive). For j = 1, . . . , s, let gj := lc( fj)~ ] fj be the monic associate of fj in 
F’fX]. Then in F[X], / factors as 

r b s 

f = Cg x ■ ■ ■ g s , 

where 

c '■= e F ’ 

i i 

and the gf s are distinct, irreducible, monic polynomials over F. 

Example 16.27. Consider the polynomial / = 4X 2 + 2X — 2 e Z[X\. Over Z[X], 
f factors as 2(2X — \ )(X + 1), where each of these three factors is irreducible in 
Z[X\. However, over 0| X |, / factors as 4(X - 1 /2)(X + 1), where 4 is a unit, and 
the other two factors are irreducible. □ 

The following theorem provides a useful criterion for establishing that a polyno- 
mial is irreducible. 

Theorem 16.50 (Eisenstein’s criterion). Let D be a UFD and F its field of frac- 
tions. Let f = c„X" + c n -\X"- ] + ■ ■ ■ + co e L>\X\. If there exists an irreducible 
p e D such that 

p\c n , p | c„_ i, • • ■ , p | c 0 , p 2 f c 0 , 
then f is irreducible over F. 

Proof. Let / be as above, and suppose it were not irreducible in F\ X ]. Then by 
Theorem 16.48, we could write / = gh, where g,h e D\X\. both of degree strictly 
less than that of /. Let us write 

g = cikX ^ + • • • + flo and h = bfX^ + • • • + bo, 

where 0 and be f 0, so that 0 < k < n and 0 < £ < n. Now, since c n = a^be, 
and p \ c n , it follows that p \ a/ ( and p \ b/. Further, since co = aobo, and p \ cq but 
p 2 \ co, it follows that p divides one of ao or bo, but not both — for concreteness, let 
us assume that p \ ao but p \ bo- Also, let m be the smallest positive integer such 
that p \ a m — note that 0 < m < k < n. 

Now consider the natural map that sends a e D to a := \a\ p e D/pD , which 
we can extend coefficient-wise to a ring homomorphism from D\X \ to ( D/pD)\X\ 
(see Example 7.46). Because D is a UFD and p is irreducible, D/pD is an integral 
domain. Since / = gh, we have 

c n X n = f = gh = (a k X k + • • • + a m X m )(b(X > + • • • + b 0 ). (16.10) 
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But notice that when we multiply out the two polynomials on the right-hand side 
of (16.10), the coefficient of X' n is a m bo f 0, and as m < n, this clearly contradicts 
the fact that the coefficient of X m in the polynomial on the left-hand side of (16.10) 
is zero. □ 

As an application of Eisenstein’s criterion, we have: 

Theorem 16.51. For every prime number q, the qth cyclotomic polynomial 

X q - I . 

:= = X q ~ l + X q - + • • • + 1 

X — 1 

is irreducible over Q. 


Proof. Let 


It is easy to see that 


f:=O g {X + l) 


{X + 1)" - 1 

(X + 1) - 1 ' 


g - 1 / \ 

/ = ^ c t X‘, where Ci = (i = 0, . . . , q - 1). 

Thus, Cq-i = 1, co = q, and for 0 < i < q — 1, we have q | c, (see Exercise 1.14). 
Theorem 16.50 therefore applies, and we conclude that / is irreducible over Q. It 
follows that <1> 9 is irreducible over ©, since if <t> q = gh were a non-trivial factoriza- 
tion of then / = cf> 9 (X + l) =g(X + l)-/i(X+l) would be a non-trivial 
factorization of /. □ 


Exercise 16.45. Show that neither Z[X\ nor F[X, 7] (where F is a field) ai - e 
PIDs (even though they are UFDs). 

Exercise 16.46. Let / e Z[X] be a monic polynomial. Show that if / has a root 
x e Q, then x e Z, and x divides the constant term of /. 

Exercise 16.47. Let D be a UFD, let p be an irreducible element of D , and 
consider the natural map that sends a e D to a := \a\ p e D/pD , which we 
extend coefficient-wise to a ring homomorphism from D\ X ] to ( D / pD)\X\ (see 
Example 7.46). Show that if / e D[X] is a primitive polynomial such that p \ lc(/) 
and / e (D / pD)\X \ is irreducible, then / is irreducible. 

Exercise 16.48. Let a be a non-zero, square-free integer, with a ^ {±1 }, and let 
n be a positive integer. Show that the polynomial X n — a is irreducible in 0\X\. 

Exercise 16.49. Show that the polynomial X 4 + 1 is irreducible in Q[X], 
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Exercise 16.50. Let F be a field, and consider the ring of bivariate polynomials 
F\X, Y], Show that in this ring, the polynomial X 2 + Y 2 — 1 is irreducible, provided 
F does not have characteristic 2. What happens if F has characteristic 2? 

Exercise 16.51. Design and analyze an efficient algorithm for the following 
problem. The input is a pair of polynomials g,h e Z[X\, along with their greatest 
common divisor cl in the ring Q[X], The output is the greatest common divisor of 
g and h in the ring Z[X\. 

Exercise 16.52. Let g, h e Z[X] be non-zero polynomials with d := gcd(g, h) e 
Z[X\. Show that for every prime p not dividing lc(g) lc(/i), we have d | gcd(£, h), 
and except for finitely many primes p, we have d = gcd(£, h). Here, d, g, and h 
denote the images of d, g, and h in Z P [X] under the coefficient- wise extension of 
the natural map from Z to Z p (see Example 7.47). 

Exercise 16.53. Let F be a field, and let g,h e F\X, Y], Define V{g,h) := 
{ (x, y) e F x F : g(x,y) = h(x, y ) = 0 } . Show that if g and h are relatively prime, 
then V(g, h ) is a finite set. Hint: consider the rings F(X)[Y] and 7 7 (Y)[X], 


16.10 Notes 

The “(1 + /)-ary gcd algorithm” in Exercise 16.42 for computing greatest common 
divisors of Gaussian integers is based on algorithms in Weilert [106] and Damgard 
and Frandsen [31]. The latter paper also develops a corresponding algorithm for 
Eisenstein integers (see Exercise 16.35). Weilert [107] presents an asymptotically 
fast algorithm that computes the greatest common divisor of Gaussian integers of 
length at most l in time 0(£ l+o(l ^). 
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In this chapter, we study algorithms for performing arithmetic on polynomials. 
Initially, we shall adopt a very general point of view, discussing polynomials whose 
coefficients lie in an arbitrary ring R, and then specialize to the case where the 
coefficient ring is a field F. 

There arc many similarities between arithmetic in Z and in R[X], and the simi- 
larities between Z and F[X ] run even deeper. Many of the algorithms we discuss 
in this chapter arc quite si mi lar to the corresponding algorithms for integers. 

As we did in Chapter 14 for matrices, we shall treat R as an “abstract data 
type,” and measure the complexity of algorithms for polynomials over a ring R by 
counting “operations in R.” 


17.1 Basic arithmetic 

Throughout this section, R denotes a non-trivial ring. 

For computational purposes, we shall assume that a polynomial g = X^=o e 
i?[X] is represented as a coefficient vector («o, a\,. . a^- 1 ). Further, when g is 
non-zero, the coefficient a^-i should be non-zero. 

The basic algorithms for addition, subtraction, multiplication, and division of 
polynomials arc quite straightforward adaptations of the corresponding algorithms 
for integers. In fact, because of the lack of “carries,” these algorithms arc actually 
much simpler in the polynomial case. We briefly discuss these algorithms here — 
analogous to our treatment of integer arithmetic, we do not discuss the details of 
“stripping” leading zero coefficients. 

For addition and subtraction, all we need to do is to add or subtract coefficient 
vectors. 

For multiplication, let g = X/=o a r'X ! e R[X] and h = X;=o ^ / X 1 e i?[X], 
where k > 1 and £ > 1. The product / := g ■ h is of the form / = XfJ"o _2 CiX', the 
coefficients of which can be computed using O(ki) operations in R as follows: 
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for i *- 0 to k + £ - 2 do c, «- 0 
for / <- 0 to k — I do 

for j <r- 0 to £ - 1 do 

Cj+j ‘ Cj+j T Of • bj 

For division, let g = Xf=o a ‘^‘ e I and h = X/=o biX 1 e R[X], where 
bf - 1 e U*. We want to compute polynomials q, r e J?[X] such that g = hq + r, 
where deg(r) < £ — 1. If k < £, we can simply set q «- 0 and r <— g; otherwise, we 
can compute ^ and r using 0(1: ■ (k — £ + 1)) operations in jR using the following 
algorithm: 

t 6 i? 

for i <r- 0 to k — 1 do r, <- a, 
for i *— k — £ down to 0 do 
Qi t ■ r i+t - 1 

for y 0 to f - 1 do 

n+j n+j ~ q i ■ bj 

Z k—i v j £—2 v i 

t=0 4t x > r <- E/= 0 OX 1 

With these simple algorithms, we obtain the polynomial analog of Theorem 3.3. 
Let us define the length of g e i?[X], denoted len(g), to be the length of its coeffi- 
cient vector; more precisely, we define 

len(g) := { S<S) + ' 

{l if g = o. 

Sometimes (but not always) it is clearer and more convenient to state the running 
times of algorithms in terms of the length, rather than the degree, of a polynomial 
(the latter has the inconvenient habit of taking on the value 0, or worse, — oo). 

Theorem 17.1. Let g and h be arbitrary polynomials in i?[X]. 

(i) We can compute g ± h with 0(len(g) + len(/?)) operations in R. 

(ii) We can compute g ■ h with Odcn(g) lcn(/i)) operations in R. 

(iii) If lc (h) e R* , we can compute q,r e R[X] such that g = hq + r and 
deg(r) < deg (h) with 0(len(/z) len(^r)) operations in R. 

Analogous to algorithms for modular integer arithmetic, we can also do arith- 
metic in the residue class ring _R[A]/(/), where / e R\X ] is a polynomial with 
lc(/) e R*. For each a e R[X]/(f), there exists a unique polynomial g e LR[X] 
with deg(g) < deg(/) and a = [g]/; we call this polynomial g the canonical 
representative of a, and denote it by rep(a). For computational purposes, we 
represent elements of R[X]/(f) by their canonical representatives. 
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With this representation, addition and subtraction in R[X]/(f) can be performed 
using 0(len(/)) operations in R. while multiplication takes 0(len(/) 2 ) operations 
in R. 

The repeated-squaring algorithm for computing powers works equally well in 
this setting: given a e R[X]/(f ) and a non-negative exponent e, we can compute 
a e using 0(len(e)) multiplications in R[X ]/(/), for a total of 0(len(e) len(/) 2 ) 
operations in R. 

Exercise 17.1. State and re-work the polynomial analogs of Exercises 3.26- 
3.28. 

Exercise 17.2. Given a polynomial g e i?[Y] and an element x e R. a particu- 
larly elegant and efficient way of computing g(x) is called Horner’s rule. Suppose 

g = Xfjo 1 ■, where k > 0 and a, e R for i = 0 k — 1. Horner’s rule 

computes g(x) as follows: 

y <- 0 ^ 

for i *— k — 1 down to 0 do 
y *- yx + a t 
output y 

Show that this algorithm correctly computes g(x) using k multiplications in R and 
k additions in R. 

Exercise 17.3. Let / e R[X ] be a polynomial of degree l > 0 with lc(/) e R*, 
and let E := R[X]/(f). Suppose that in addition to /, we are given a polynomial 
g € i?[X] of degree less than k and an element a e E, and we want to compute 
g(a ) e E. This is called the modular composition problem. 

(a) Show that a straightforward application of Horner’s rule yields an algo- 
rithm that uses 0(kl 2 ) operations in R. and requires space for storing 0(1) 
elements of R. 

(b) Show how to compute g(a) using just ()(k( + k l / 2 l 2 ) operations in R. at 
the expense of requiring space for storing 0(k l / 2 £) elements of R. Hint: 
first compute a table of powers 1, a , ... , a m , for m & k 1 / 2 . 

Exercise 17.4. Given polynomials g,h e i?[X], show how to compute their 
composition g(h) e i?[X] using 0(len(g) 2 len(k) 2 ) operations in R. 

Exercise 17.5. Suppose you are given three polynomials f,g,h e Z P [X], 
where p is a large prime, in particular, p > 2 dcg(g) dcg(h). Design an effi- 
cient probabilistic algorithm that tests if / = g(h) (i.e., if / equals g composed 
with h). Your algorithm should have the following properties: if / = g(h), it 
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should always output “true,” and otherwise, it should output “false” with prob- 
ability at least 0.999. The expected running time of your algorithm should be 
0((len(/) + len(g) + len .(h)) len(p) 2 ). 

Exercise 17.6. Let x, ao at-i £ R , and let k be an integer with 0 < k < £. 

For i = 0 l — k, define g,- := Xyt; -1 a jX' e -R[X]. Show how to compute the 

l — k + 1 values go(x), . . . , gt-^ix) using 0(1) operations in R. 


17.2 Computing minimal polynomials in F[X]/(f) (I) 

In this section, we shall examine a computational problem to which we shall return 
on several occasions, as it will serve to illustrate a number of interesting algebraic 
and algorithmic concepts. 

Let F be a field, and let / e L[X] be a monic polynomial of degree ( > 0. Also, 
let E := F\X\/(f), which is an L-algebra, and in particular, an F- vector space. 
As an F- vector space, E has dimension (. Suppose we arc given an element a e E, 
and want to efficiently compute the minimal polynomial of a over F — that is, the 
monic polynomial tp £ F[X] of least degree such that <p(a ) = 0, which we know 
has degree at most i (see §16.5). 

We can solve this problem using polynomial arithmetic and Gaussian elimi- 
nation, as follows. Consider the F- linear map p : F[X ]</ — <• E that sends a 
polynomial g e F\X\ of degree at most l to g(a). To perform the linear algebra, 
we need to specify bases for F[X]<^ and E. For F[X]<i, let us work with the basis 
S := With this choice of basis, for g = ^ =0 a,X' e F[X]<^, the 

coordinate vector of g is Vecs(g) = (a ^, . . . , ao) £ pi x h+D p or £ [ e t us work 
with the basis T := {<f -1 }/ = p where | := [X]/ e E. Let 

A := Mat s ,t(p) e F {M)xe \ 

that is, A is the matrix of p relative to S and T (see §14.2). For i = 1, ...,£ + 1, 
the /th row of A is the coordinate vector Yecr(a f+l ~') £ F lxt . 

We compute the matrix A by computing the powers 1, a a 1 , reading off the 

/th row of A directly from the canonical representative of the a f+, ~' . We then 
apply Gaussian elimination to A to find row vectors vi, . . . , £ L 1 x</+l ’ that arc 

coordinate vectors corresponding to a basis for the kernel of p. Now, the coordinate 
vector of the minimal polynomial of a is a linear combination of v i , . . . , v s . To find 
it, we form the s x (i + 1) matrix B whose rows consist of v\, . . . , v s , and apply 
Gaussian elimination to B, obtaining an sx((+ I) matrix B' in reduced row echelon 
form whose row space is the same as that of B. Let (p be the polynomial whose 
coordinate vector is the last row of B' . 

Because of the choice of basis for L[X]<^, and because B' is in reduced row 
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echelon form, it is clear that no non-zero polynomial in Ker p has degree less than 
that of tp ■ Moreover, as cp is already monic (again, by the fact that B' is in reduced 
row echelon form), it follows that cp is in fact the minimal polynomial of a over F. 

The total amount of work performed by this algorithm is 0(£ 3 ) operations in F 
to build the matrix A (this just amounts to computing l successive powers of a, 
that is, ()(t) multiplications in E, each of which takes 0(l 2 ) operations in F), and 
0 (£ 3 ) operations in F to perform both Gaussian elimination steps. 


17.3 Euclid’s algorithm 

In this section, F denotes a field, and we consider the computation of greatest 
common divisors in E[X]. 

The Euclidean algorithm for integers is easily adapted to compute gcd(g, h ) 
for polynomials g,h e F \ X ]. Analogous to the integer case, we assume that 
deg(g) > dcg(/i); however, we shall also assume that g f 0. This is not a serious 
restriction, of course, as gcd(0, 0) = 0, and making this restriction will simplify 
the presentation a bit. Recall that we defined gcd(g, h) to be either zero or monic, 
and the assumption that g f 0 means that gcd(g, h ) is non-zero, and hence monic. 

The following is the analog of Theorem 4.1, and is based on the division with 
remainder property for polynomials. 

Theorem 17.2. Let g,h e E[X], with deg(g) > deg(/i) and g f- 0. Define the 

polynomials rq,r\, r x+ \ 6 E[X] and q\,...,q x € F[X], where A > 0, as 

follows: 

g = ro, 
h = r\, 

r Q = riqi + r 2 (0 < deg(r 2 ) < deg(n)), 

n-l = nq, + n+\ (0 < deg(r /+ i) < deg(r ; )), 

r 4-2 = r X -\qx-\ + r x (0 < deg (r x ) < deg(o_i)), 
r X - 1 = r x q x {r x+l = 0). 

Note that by definition, A = 0 if h = 0, and A > 0 otherwise. Then we have 
r x / \c{r x ) = gcd(g, h), and if h ■£ 0, then A < deg (h) + 1. 

Proof. Arguing as in the proof of Theorem 4.1, one sees that 

gcd (g,h) = gcd(r 0 , r\) = • • • = gcd{r x ,r x+ i) = gcdfr^O) = r x /\c(r x ). 
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That proves the first statement. 

For the second statement, if h f 0, then the degree sequence 

deg(ri), deg(r 2 ), . . . , deg( rf) 

is strictly decreasing, with degfr^) > 0, from which it follows that deg(/j) = 
deg(ri) > A- 1. □ 

This gives us the following polynomial version of the Euclidean algorithm: 

Euclid’s algorithm. On input g, h, where g,h e F\X\ with deg(g) > deg(h) and 
g f 0, compute d = gcdfg, h ) as follows: 

r <— g, r' <— h 
while r' ^ 0 do 

r" <— r mod r' 

(r,r') <- (/, r") 
d <— r/lc(r) // make monic 
output d 

Theorem 17.3. Euclid’s algorithm for polynomials performs O(lenfg) len(/j)) 
operations in F . 

Proof. The proof is almost identical to that of Theorem 4.2. Details are left to the 
reader. □ 

Just as for integers, if d = gcd(g, h ), then gF[X] + hF[X] = dF[X], and so there 
exist polynomials s and t such that gs + ht = d. The procedure for calculating s 
and t is precisely the same as in the integer case; however, in the polynomial case, 
we can be much more precise about the relative sizes of the objects involved in the 
calculation. 


Theorem 17.4. Let g, h, ro, .... o+i and q \, . . . , be as in Theorem 1 1.2. Define 
polynomials so, . . ■ , s^+i e E[X] and to, . . ■ , D+i e F|X | as follows: 

s 0 := 1, to := 0, 

si := 0, ti := 1, 

and for i = 1, . . . , A, 

•s/+i := Si-t - s,q h t i+ i := t,_i - t,^. 


Then: 

(i) for i = 0, A + 1, we have gs, + htj = r, : in particular, gs \ + ht, = 

lcfojgcdfg./i); 

(if) for i = 0 A, we have Sjt i+ \ - t ; s,+i = ( — 1)' ; 
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(iii) for i = 0 A + 1, we have gcd(s,, tf) = 1; 

(iv) for i = l, . . . , A + l, we have 

deg 0;) = deg(g) - deg(r,-_i), 
and for i = 2, . . . , A + 1, we have 

deg(s,) = deg (h) - deg(r,_i); 

(v) for i = 1, A + 1, we have deg(q) < deg(g) and deg(s,) < deg (h); if 

deg(g) > 0 and h f 0, then deg(f,i) < deg(g) and deg^) < deg (h). 

Proof, (i), (ii), and (iii) are proved just as in the corresponding parts of Theo- 
rem 4.3. 

For (iv), the proof will hinge on the following facts: 

• For i = 1, . . . , A, we have deg(r,_i) > deg (/-,). and since q, is the quotient 
in dividing r,_i by r h we have dcg(r/,) = deg(r,_i) - deg (/-,). 

• For / = 2, . . . , A, we have deg(r ; _i) > deg(r,). 

We prove the statement involving the r,’s by induction on i, and leave the proof 
of the statement involving the sfs to the reader. 

One can see by inspection that this statement holds for i = 1, since deg(tj) = 0 
and ro = g. If A = 0, there is nothing more to prove, so assume that A > 0 and 
h 0. 

Now, for i = 2, we have t 2 = 0 — 1 • qi = —q\. Thus, deg(f 2 ) = deg(gi) = 
deg(r 0 ) - deg(ri) = deg(g) - deg(n). 

Now for the induction step. Assume i > 3. Then we have 

deg(f;_i< 7 ;_i) = deg(t,_i) + deg(^,_i) 

= deg (g) - deg(r,_ 2 ) + deg(^_i) (by induction) 

= deg(g) - deg(r,_i) 

(since deg(^_i) = deg(r,_ 2 ) - deg(r,_!)) 

> deg(g) - deg(r,_ 3 ) (since deg(r,_ 3 ) > deg(r,_i)) 

= deg(t,_ 2 ) (by induction). 

By definition, r, = ?,_ 2 - f,_ i q ,- \ , and from the above reasoning, we see that 

deg(g) - deg(r,_i) = deg(f / _i^_ 1 ) > deg(?,_ 2 ), 

from which it follows that dcg(?,j = deg(g) - deg(r,_i). 

(v) follows easily from (iv). □ 

From this theorem, we obtain the following algorithm: 
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The extended Euclidean algorithm. On input g, h, where g,h e F\X\ with 
deg(g) > deg (h) and g f 0, compute d, s, and t, where d,s,t e E[X], d = gcd(g, h) 
and gs + ht = d, as follows: 

r <— g, r' <— h 
s < — 1, s' < — 0 

while r' f- 0 do 

compute q, r" such that r = r'q + r", with deg(r") < deg(r') 

(r, 5, t, r', s', 1') <r- {r', s', t', r" , s - s'q, t - t'q) 
c <- lc(r) 

d 4- r/c, s < — s / c, t <r- t/c //makemonic 
output d, s, t 

Theorem 17.5. The extended Euclidean algorithm for polynomials performs 
0(len(g) lent/?)) operations in F. 

Proof. Exercise. □ 

Exercise 17.7. State and re-work the polynomial analogs of Exercises 4.2, 4.3, 
4.4, 4.5, and 4.8. 

17.4 Computing modular inverses and Chinese remaindering 

In this and the remaining sections of this chapter, we explore various applications 
of Euclid’s algorithm for polynomials. Most of these applications arc analogous 
to their integer counterparts, although there arc some differences to watch for. 
Throughout this section, F denotes a field. 

We begin with the obvious application of the extended Euclidean algorithm for 
polynomials to the problem of computing multiplicative inverses in F[X]/(/). 

Theorem 17.6. Suppose we are given polynomials f,he F\X\, where deg (77) < 
deg(/). Then using 0(len(/) 2 ) operations in F, we can determine if h is relatively 
prime to f, and if so, compute h~ l mod /. 

Proof. We may assume deg(/) > 0, since deg(/) = 0 implies h = 0 = IF 1 mod /. 
We run the extended Euclidean algorithm on input /, h, obtaining polynomials 
d, s, t such that d = gcd (/, h) and / s + ht = d. If d f- 1, then h does not have 
a multiplicative inverse modulo /. Otherwise, if d = 1, then t is a multiplica- 
tive inverse of h modulo /. Moreover, by part (v) of Theorem 17.4, we have 
deg(t) < deg(/), and so t = h~ l mod /. Based on Theorem 17.5, it is clear that 
all the computations can be performed using 0(len(/) 2 ) operations in F . □ 
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We also observe that the Chinese remainder theorem for polynomials (Theo- 
rem 16.19) can be made computationally effective as well: 

Theorem 17.7 (Effective Chinese remainder theorem). Suppose we are given 

polynomials j \ , . . . , f k e F[X] and g { g k e F[X], where the family {/, }f =1 

is pairwise relatively prime, and where deg (/,■) > 0 and deg(g;) < deg(/,) for 
i = 1 Let f := nt, ft. Then using 0(len(/) 2 ) operations in F, we 

can compute the unique polynomial g e F\ X ] satisfying deg(g) < deg(/) and 
g = gi (mod /,) for i = 1, . . . , k. 

Proof. Exercise (just use the formulas given after Theorem 16.19). □ 


Polynomial interpolation 

We remind the reader of the discussion following Theorem 16.19, where the point 
was made that when /, = X — x, and g, = y ( , for i = 1, . . . , k, then the Chinese 
remainder theorem for polynomials reduces to Lagrange interpolation. Thus, The- 
orem 17.7 says that given distinct elements x\,...,Xk e F, along with elements 
yi,...,yk £ F, we can compute the unique polynomial g e E[X] of degree less 
than k such that 

g(x,) = yt (i=l,...,k), 
using 0(k 2 ) operations in F . 

It is perhaps worth noting that we could also solve the polynomial interpolation 
problem using Gaussian elimination, by inverting the corresponding Vandermonde 
matrix (see Example 14.2). However, this algorithm would use 0(k 3 ) operations 
in F. This is a specific instance of a more general phenomenon: there are many 
computational problems involving polynomials over fields that can be solved using 
Gaussian elimination, but which can be solved more efficiently using more special- 
ized algorithmic techniques. 


Speeding up algorithms via modular computation 

In §4.4, we discussed how the Chinese remainder theorem could be used to speed 
up certain types of computations involving integers. The example we gave was the 
multiplication of integer matrices. We can use the same idea to speed up certain 
types of computations involving polynomials. For example, if one wants to mul- 
tiply two matrices whose entries are elements of E[X], one can use the Chinese 
remainder theorem for polynomials to speed things up. This strategy is most easily 
implemented if F is sufficiently large, so that we can use polynomial evaluation 
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and interpolation directly, and do not have to worry about constructing irreducible 
polynomials. 

Exercise 17.8. Adapt the algorithms of Exercises 4.14 and 4.15 to obtain an 
algorithm for polynomial interpolation. This algorithm is called Newton interpo- 
lation. 


17.5 Rational function reconstruction and applications 

Throughout this section, F denotes a field. 

We next state and prove the polynomial analog of Theorem 4.9. As we arc 
now “reconstituting” a rational function, rather than a rational number, we call this 
procedure rational function reconstruction. Because of the relative simplicity of 
polynomials compared to integers, the rational reconstruction theorem for polyno- 
mials is a bit “sharper” than the rational reconstruction theorem for integers, and 
much simpler to prove. 

To state the result precisely, let us introduce some notation. For polynomials 
g, h e E[A] with deg(g) > deg (h) and g f 0, let us define 

EEA(g, h) := 

where r t , Si, and f;, for i = 0 X + 1, are defined as in Theorem 17.4. 

Theorem 17.8 (Rational function reconstruction). Let f,he F\X\ be polyno- 
mials, and let r*, t* be non-negative integers, such that 

deg (h) < deg (/) and r* + t* < deg(/). 

Further, let EEA (f,h) = {(r h s h tj)}f+ ( ! , and let j be the smallest index (among 
0, . . . , A + 1) such that deg (rj) < r* , and set 

r' := rj, s' := Sj, and t' := tj. 

Finally, suppose that there exist polynomials r,s,t e F\X\ such that 
r = f s + ht, deg(r) < r*, and 0 < deg(t) < t*. 

Then for some non-zero polynomial q e F\ X ]. we have 

r = r'q, s = s'q, t = t'q. 

Proof. Since deg(ro) = deg(/) > r* > — oo = deg(o+i), the value of j is well 
defined, and moreover, j > 1, deg(r y _i) > r* , and tj f- 0. 
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From the equalities rj = / sj + htj and r = f s+ht, we have the two congruences: 

i-j = htj (mod /), 
r = ht (mod /). 

Subtracting t times the first from tj times the second, we obtain 

rtj = r jt (mod /). 

This says that / divides rtj — rf. 

We want to show that, in fact, rtj — rjt = 0. To this end, first observe that by paid 
(iv) of Theorem 17.4 and the inequality deg(r 7 _i) > r*, we have 

deg(0) = deg(/) - deg(r 7 _i) < deg (/) - r*. 

Combining this with the inequality dcg(r) < r'\ we see that 

deg (rtj) = deg(r) + deg (tj) < deg (/). 

Furthermore, using the inequalities 

deg(r 7 ) < r*, deg(f) < t*, and r* + t* < deg(/), 

we see that 

deg {rjt) = deg (rj) + deg (t) < deg (/), 
and it immediately follows that 

deg (rtj - rjt) < deg (/). 

Since / divides rtj — rjt and deg (rtj — rjt ) < deg(/), the only possibility is that 

rt J ~n t = °- 

The rest of the proof follows exactly the same line of reasoning as in the last 
paragraph in the proof of Theorem 4.9, as the reader may easily verify. □ 

17.5.1 Application: recovering rational functions from their reversed Laurent 

series 

We now discuss the polynomial analog of the application in §4.6.1. This is an 
entirely straightforward translation of the results in §4.6.1, but we shall see in the 
next chapter that this problem has its own interesting applications. 

Suppose Alice knows a rational function z = s/t e F(X), where s and t are 
polynomials with deg(s) < deg(f), and tells Bob some of the high-order coeffi- 
cients of the reversed Laurent series (see §16.8) representing z in We 

shall show that if deg(f) < t and Bob is given the bound l on deg(t), along with the 
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high-order 21 coefficients of z, then Bob can determine z, expressed as a rational 
function in lowest terms. 

So suppose that z = s/t = ZiX~ l , and that Alice tells Bob the coefficients 
Z\,...,Z 2 t- Equivalently, Alice gives Bob the polynomial 

h := ziX J 1 -I f Z2i-\X + Z2t- 

Also, let us define / := X 21 . Here is Bob’s algorithm for recovering z- 

1. Run the extended Euclidean algorithm on input f, h to obtain EEA(/, h), 
and apply Theorem 17.8 with f, h , r* := t, and t* '■= l, to obtain the 
polynomials r', s', t' . 

2. Output s', t'. 

We claim that z = —s'/t'. To prove this, first observe that h = [fz\ = If s/t\ 
(see Theorem 16.32). So if we set r := / s mod t, then we have 

r = f s — ht, deg(r) < r*, 0 < deg(f) < t* , and r* + t* < deg (/). 

It follows that the polynomials s', t' from Theorem 17.8 satisfy 5 = s' q and —t = t'q 
for some non-zero polynomial q, and thus, s'/t' = —s/t, which proves the claim. 

We may further observe that since the extended Euclidean algorithm guarantees 
that gcd(s', t') = 1, not only do we obtain z, but we obtain z expressed as a fraction 
in lowest terms. 

It is clear that this algorithm takes Off 2 ) operations in F. 


17.5.2 Application: polynomial interpolation with errors 
We now discuss the polynomial analog of the application in §4.6.2. 

If we “encode” a polynomial g e E[X], with deg(g) < k, as the sequence 
(yi, . . . , yk) e F xk , where y, = g(x;), then we can efficiently recover g from this 
encoding, using an algorithm for polynomial interpolation. Here, of course, the 
x/s are distinct elements of F. 

Now suppose that Alice encodes g as (yi, . . . , y/<), and sends this encoding to 
Bob, but that some, say at most l, of the y, ’s may be corrupted during transmission. 
Let (zi, ■ . . , Zk) denote the vector actually received by Bob. 

Here is how we can use Theorem 17.8 to recover the original value of g from 
(zi,...,Zfc), assuming: 

• the original polynomial g has degree less than m, 

• at most l errors occur in transmission, and 

• k > 21 + m. 

Let us set f, := X — x, for i = 1, . . . , k, and / := f\ ■ ■ ■ fk ■ Now, suppose Bob 
obtains the corrupted encoding (zi, ■ ■ • , Zk)- Here is what Bob does to recover g: 
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1. Interpolate, obtaining a polynomial h , with deg (/z) < k and h(x : ) = zi for 
i = 1 ,k. 

2. Run the extended Euclidean algorithm on input /, h to obtain EEA(/, h), 
and apply Theorem 17.8 with /, h, r* := m + l and t* := L to obtain the 
polynomials r' , s', t'. 

3. If t' | r' , output r' /t'\ otherwise, output “error.” 

We claim that the above procedure outputs g, under the assumptions listed above. 
To see this, let t be the product of the /,’s for those values of i where an error 
occurred. Now, assuming at most i errors occurred, we have deg(f) < l. Also, let 
r := gt, and note that deg(r) < m + l . We claim that 


r = ht (mod /). 

To show that (17.1) holds, it suffices to show that 

(17.1) 

gt = ht (mod /,) 

(17.2) 


for all i = 1 To show this, consider first an index i at which no error 

occurred, so that y t = zt- Then gt = y,t (mod //) and ht = zd = yd (mod /,), and 
so (17.2) holds for this i. Next, consider an index i for which an error occurred. 
Then by construction, gt = 0 (mod //) and ht = 0 (mod /,), and so (17.2) holds 
for this i. Thus, (17.1) holds, from which it follows that the values r' , t' obtained 
from Theorem 17.8 satisfy 

/•' r gt 
t' ~ t ~ t ~ 8 ' 

One easily checks that both the procedures to encode and decode a value g run in 
time 0{k 2 ). The above scheme is an example of an error correcting code called 
a Reed-Solomon code. 


17.5.3 Applications to symbolic algebra 

Rational function reconstruction has applications in symbolic algebra, analogous 
to those discussed in §4.6.3. In that section, we discussed the application of solv- 
ing systems of linear equations over the integers using rational reconstruction. In 
exactly the same way, one can use rational function reconstruction to solve systems 
of lineal - equations over EfX] — the solution to such a system of equations will be 
a vector whose entries are elements of F(X), the field of rational functions. 


Exercise 17.9. Consider again the secret sharing problem, as discussed in Exam- 
ple 8.28. There, we presented a scheme that distributes shares of a secret among 
several parties in such a way that no coalition of k or fewer parties can reconstruct 
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the secret, while every coalition of k + 1 parties can. Now suppose that some parties 
may be corrupt: in the protocol to reconstruct the secret, a corrupted party may 
contribute an incorrect share. Show how to modify the protocol in Example 8.28 
so that if shares arc distributed among several parties, then 

(a) no coalition of k or fewer part ies can reconstruct the secret, and 

(b) if at most k part ies arc corrupt, then every coalition of 3k + 1 parties (which 
may include some of the corrupted parties) can correctly reconstruct the 
secret. 

The following exercises are the polynomial analogs of Exercises 4.20, 4.22, and 
4.23. 

Exercise 17.10. Let The a field. Show that given polynomials s,t e F|X| and 
integer k, with deg(s) < deg(f) and k > 0, we can compute the /cth coefficient in 
the reversed Laurent series representing s/t using 0(lcn(/c) lent?) 2 ) operations in 
F. 

Exercise 17.11. Let F be a field. Let z e F((X -1 )) be a reversed Laurent series 
whose coefficient sequence is ultimately periodic. Show that z € F(X). 

Exercise 17.12. Let F be afield. Let z = s/t, where s,t e F[X], deg(s) < deg(t), 
and gcd(s, t) = 1. 

(a) Show that if F is finite, there exist integers k, k' such that 0 < k < k' and 
sX k = sX k ' (mod t). 

(b) Show that for integers k, k' with 0 < k < k' , the sequence of coefficients of 
the reversed Laurent series representing z is ( k , k' — /c (-periodic if and only 
if sX k = sX k ' (mod t). 

(c) Show that if F is finite and X \ t, then the reversed Laurent series repre- 
senting z is purely periodic with period equal to the multiplicative order of 
[X], e ( F[X]/(f))* . 

(d) More generally, show that if F is finite and t = X k t', with X \ t', then the 
reversed Laurent series representing 2 is ultimately periodic with pre-period 
k and period equal to the multiplicative order of [X]^ e ( F\X\/(t' ))* . 


17.6 Faster polynomial arithmetic (*) 

The algorithms discussed in §3.5 for faster integer arithmetic arc easily adapted to 
polynomials over a ring. Throughout this section, R denotes a non-trivial ring. 


Exercise 17.13. State and re-work the analog of Exercise 3.41 for F[X], Your 
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algorithm should multiply two polynomials over R of length at most l using 
0(f log 2 3 ) operations in R. 

It is in fact possible to multiply polynomials over R of length at most l using 
0(1 len(f) len(len(f))) operations in R — we shall develop some of the ideas that 
lead to such a result below in Exercises 17.21-17.24 (see also the discussion in 
§17.7). 

In Exercises 17.14-17.19 below, assume that we have an algorithm that multi- 
plies two polynomials over R of length at most t using at most M(i) operations in 
R , where M is a well-behaved complexity function (as defined in §3.5). 

Exercise 17.14. State and re-work the analog of Exercises 3.46 and 3.47 for 
R[X]. 

Exercise 17.15. This problem is the analog of Exercise 3.48 for i?[X], Let 
us first define the notion of a “floating point” reversed Laurent series z, which 
is represented as a pair (g,e), where g e 1?[X] and e e Z — the value of z is 
gX e e R((X~ 1 )), and we call len(g) the precision of z- We say that £ is a length 
k approximation of z £ R((X~ 1 )) if z has precision k and z = (1 + t)z. for 
e € R((X~ 1 )) with deg(e) < —k, which is the same as saying that the high-order k 
coefficients of z and z are equal. Show that given h e R[X] with lc (h) e R* , and 
positive integer k, we can compute a length k approximation of 1 /h e R((X~ 1 )) 
using 0(M(k)) operations in R. Hint: using Newton iteration, show how to go 
from a length t approximation of 1 /h to a length 2 1 approximation, making use of 
just the high-order 2 1 coefficients of h, and using 0(M(t)) operations in R. 

Exercise 17.16. State and re-work the analog of Exercise 3.49 for 7?[X]. 

Exercise 17.17. State and re-work the analog of Exercise 3.50 for J?[X], Con- 
clude that a polynomial of length at most k can be evaluated at k points using 
0(M(k) len(k)) operations in R. 

Exercise 17.18. State and re-work the analog of Exercise 3.52 for R[X], assum- 
ing 2 r e R*. 

The next two exercises develop a useful technique known as Kronecker substi- 
tution. 

Exercise 17.19. Let g,h e R[X, Y ] with g = X''^' gi Y‘ and h = X'o' hY 1 , 
where each g, and h, is a polynomial in X of degree less than k. The product 
/ := gh e R[X, 7] may be written / = Xrfo" 2 ft Y\ where each /, is a polynomial 
in X. Show how to compute /, given g and h, using 0(M(km )) operations in R. 
Hint: for an appropriately chosen integer t > 0, first convert g, h to g, ~h e R[X], 
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where g := g,-X" and h := X/l"o* hiX u \ next, compute / := gh e i?[X]; 

finally, “read off” the /, ’s from the coefficients of /. 

Exercise 17.20. Assume that integers of length at most i can be multiplied in 
time M(l), where M is a well-behaved complexity function. Let g, h e Z[X\ with 
g = a ‘X‘ and h = biX 1 , where each a, and hi is a non-negative integer, 
strictly less than 2 k . The product / := gh e Z[X\ may be written / = c,X\ 

where each c, is a non-negative integer. Show how to compute /, given g and h, 
using 0(M((k + len operations in R. Hint: for an appropriately cho- 
sen integer t > 0, first convert g, h to a,b e Z, where a := X/lo* ai ^' an< ^ 
b := X/lT) 1 b{l tl \ next, compute c := ab e Z; finally, “read off” the c,’s from 
the bits of c. 

The following exercises develop an important algorithm for multiplying polyno- 
mials in almost-linear time. For an integer n > 0, let us call co e R a primitive 
2"th root of unity if n > 1 and co 2 " = — 1r, or n = 0 and co = 1r; if 2 r ^ Or, 
then in particular, co has multiplicative order 2 n . For n > 0, and co e R a prim- 
itive 2"th root of unity, let us define the /^-linear map £ nm : R x 2 R x2 " that 
sends the vector (no, , . . , fl 2 "-t) to the vector (£(1 r), g{co ), . . . , g(cw 2 " -1 )), where 

g ■■= zS 1 a > x ‘ e 

Exercise 17.21. Suppose 2 r e R* and co e R is a primitive 2' J th root of unity. 

(a) Let k be any integer, and consider gcd(/c, 2"), which must be of the form 

2 m for some m = 0 n. Show that co k is a primitive 2" _m th root of unity. 

(b) Show that if n > 1, then co — 1r e R*. 

(c) Show that co k — 1r e R* for all integers k ^ 0 (mod 2"). 

(d) Show that for every integer k, we have 

y 1 ki = f 2\ if /c = 0 (mod 2"), 

^ ® \ Or if k £ 0 (mod 2"). 

(e) Let M 2 be the 2-multiplication map on R x2 " , which is a bijective, .R-linear 
map. Show that 

£n,co ° £n.(»~ ] — M2 — £n,m~ l 0 

and conclude that £, um is bijective, with M^ n o£ n o) - 1 being its inverse. Hint: 
write down the matrices representing the maps £ n/ „ and £ nfi) -< . 

Exercise 17.22. This exercise develops a fast algorithm, called the fast Fourier 
transform or FFT, for computing the function £„ A> - This is a recursive algorithm 



17.6 Faster polynomial arithmetic (*) 


481 


FFT(m, co\ao,..., a 2 "~ i) that takes as input an integer n > 0, a primitive 2"th root 
of unity co e R, and elements ao , . . . , 02 «-\ e R, and runs as follows: 

if n = 0 then 
return ao 

else 

(ao , . . . , ac 2 »-i-i) <r- FFT(n- 1, co 2 ; a 0 , a 2 , . . . , a2»-2) 

(p 0 ,..-,p 2 »-i-i) FFT(« - 1, co 2 ; a\, a 3 , . . . , a 2 «-i) 
for / <— 0 to 2" _1 - 1 do 

7/ <- at + Pim', y i+ 2"-' <*; - Pto» l 

return (y 0 , . . . , y 2 «-i) 

Show that this algorithm correctly computes £ nw (ao, ■ ■■ , a 2 n-i ) using 0(2" n) oper- 
ations in R. 

Exercise 17.23. Assume 2r e R*. Suppose that we are given two polyno- 
mials g, h e jR[X] of length at most l, along with a primitive 2"th root of unity 
co e R, where 21 < 2" < 41. Let us “pad” g and h, writing g = X/Lt) 1 a ‘ X ‘ 

_ 9«_i 

and h = X?=o t>iX l , where a, and h t arc zero for i > 1. Show that the following 
algorithm correctly computes the product of g and h using 0(1 len(f)) operations 
in R: 

(a 0 ,...,a 2 «- 1 ) <r- FFT(n,co;ao,...,a 2 "-i) 

(Po, ■ ■ ■ , p 2 n -i) *- FFT(n, co: bo , . . . , b 2 "~\) 

(Yo, ■ ■ ■, Y2 n - l) (<*0p0 a 2"-l p2"-\ ) 

(c 0 , ...,C 2 »-i) <r- 2^" FFT(«, ® _1 ; Yo, ■ ■ ■ , Y2"-i) 
output XZo 2 C ' X ' 

Also, argue more carefully that the algorithm performs 0(1 lend)) additions and 
subtractions in R, 0(1 lend)) multiplications in R by powers of co, and 0(1) other 
multiplications in R. 

Exercise 17.24. Assume 2 r e R*. In this exercise, we use the FFT to develop an 
algorithm that multiplies polynomials over R of length at most 1 using 0(1 len (l) 13 ) 
operations in R , where ft is a constant. Unlike the previous exercise, we do not 
assume that R contains any particular primitive roots of unity; rather, the algo- 
rithm will create them “out of thin air.” Suppose that g,h e i?[X] are of length 
at most 1. Set k := (\Jl/2\ , m := \l/k | . We may write g = X/'lT)' 8i xkl and 
h = X'lT)' hjX kl , where the g, ’s and /z, ’s arc polynomials of length at most k. Let 
n be the integer determined by 2m < 2" < 4m. Let q := X 2 " + Ir e jR[X], 

E := R[X]/(q), and co := [X] q e E. 

(a) Show that co is a primitive 2"th root of unity in E, and that given an element 
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£ e E and an integer i between 0 and 2" - 1, we can compute £cw' e E 
using operations in R. 

(b) Let g := e £[7] and h '■= [h^Y 1 e £[7]. Using 

the FFT (over is), show how to compute / := gh e E| 7| by computing 

products in R[X ] of polynomials of length 0(( 1 ^ 2 ), along with 
0(i len(f)) additional operations in R. 

(c) Show how to compute the coefficients of / := gh e R[A] from the value 
f £ E\Y\ computed in paid (b), using 0(1) operations in R. 

(d) Based on parts (a)-(c), we obtain a recursive multiplication algorithm: on 
inputs of length at most l, it performs at most ocqI len(f) operations in R, 
and calls itself recursively on at most a\l x ^ 2 subproblems, each of length 
at most ail x ! 2 \ here, ao, «i and ai arc constants. If we just perform one 
level of recursion, and immediately switch to a quadratic multiplication 
algorithm, we obtain an algorithm whose operation count is 0(1 L5 ). If we 
perform two levels of recursion, this is reduced to 0(1 L25 ). For practical 
purposes, this is probably enough; however, to get an asymptotically better 
complexity bound, we can let the algorithm recurse all the way down to 
inputs of some (appropriately chosen) constant length. Show that if we do 
this, the operation count of the recursive algorithm is 0(1 lcn(( : / ; ) for some 
constant ft (whose value depends on a\ and ao)- 

The approach used in the previous exercise was a bit sloppy. With a bit more 
care, one can use the same ideas to get an algorithm that multiplies polynomials 
over R of length at most t using 0(1 len(£) len(len(f ))) operations in R , assuming 
2 r e R*. The next exercise applies similar ideas, but with a few twists, to the 
problem of integer multiplication. 

Exercise 17.25. This exercise uses the FFT to develop a linear-time algorithm 
for integer multiplication; however, a rigorous analysis depends on an unproven 
conjecture (which follows from a generalization of the Riemann hypothesis). Sup- 
pose we want to multiply two positive integers a and b, each of length at most i 
(represented internally using the data structure described in §3.3). Throughout this 
exercise, assume that all computations are done on a RAM, and that arithmetic 
on integers of length 0(lcn(t)j takes time 0(1). Let k be an integer parameter 
with k = 0(len(f)), and let m := | t/E}. We may write a = X/lo 1 a i^ k ' an d 
b = £'” = “' b;2 kl , where 0 < a t < 2 k and 0 < />, < 2 k . Let n be the integer 
determined by 2m < 2" < 4m. 

(a) Assuming Conjecture 5.22, and assuming a deterministic, polynomial-time 
primality test (such as the one to be presented in Chapter 21), show how 
to efficiently generate a prime p = 1 (mod 2") and an element co e Z* of 
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multiplicative order 2", such that 

2 2k m <p< 1° < l \ 

Your algorithm should be probabilistic, and run in expected time polyno- 
mial in len(f). 

(b) Assuming you have computed p and o> as in paid (a), let g := X/lT)* I <2,] P X' £ 
Z P [X] and h := X!=7)' £ Z P [X], and show how to compute / := gh e 
Z P \X\ in time 0(1) using the FFT (over Z p ). Fiere, you may store elements 
of Z p in single memory cells, so that operations in Z p take time 0(1). 

(c) Assuming you have computed / e Z P [X] as in paid (b), show how to obtain 
c := ab in time 0(1). 

(d) Conclude that assuming Conjecture 5.22, we can multiply two integers of 
length at most f on a RAM in time 0(i). 

Note that even if one objects to our accounting practices, and insists on charging 
0(len(f) 2 ) time units for arithmetic on numbers of length 0(lcn(()), the algorithm 
in the previous exercise runs in time 0(1 lcn(C) 2 ), which is “almost” linear time. 

Exercise 17.26. Continuing with the previous exercise: 

(a) Show how the algorithm presented there can be implemented on a RAM 
that has only built-in addition, subtraction, and branching instructions, but 
no multiplication or division instructions, and still run in time 0(1). Also, 
memory cells should store numbers of length at most lcn(t ) + 0(1). Flint: 
represent elements of Z p as sequences of base-2 r digits, where t x a lcn(C) 
for some constant a < 1; use table lookup to multiply /-bit numbers, and to 
perform 2t-by-t-bit divisions — for a sufficiently small, you can build these 
tables in time o(£). 

(b) Using Theorem 5.23, show how to make this algorithm fully deterministic 
and rigorous, assuming that on inputs of length £, it is provided with a 
certain bit string at of length 0(len(£)) (this is called a non-uniform algo- 
rithm). 

Exercise 17 .27 . This exercise shows how the algorithm in Exercise 17.25 can 
be made quite concrete, and fairly practical, as well. 

(a) The number p := 2 59 27 + 1 is a 64-bit prime. Show how to use this value 
of p in conjunction with the algorithm in Exercise 17.25 with k = 20 and 
any value of l up to 2 27 . 

(b) The numbers p\ := 2 30 3 + 1, /r> : = 2 28 1 3 + 1, and p 3 := 2 27 29 + 1 are 32- 
bit primes. Show how to use the Chinese remainder theorem to modify the 
algorithm in Exercise 17.25, so that it uses the three primes p\ , P2, Ac and 
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so that it works with k — 32 and any value of l up to 2 31 . This valiant may 
be quite practical on a 32-bit machine with built-in instructions for 32-bit 
multiplication and 64-by-32-bit division. 

The previous three exercises indicate that we can multiply integers in essentially 
linear time, both in theory and in practice. As mentioned in §3.6, there is a differ- 
ent, fully deterministic and rigorously analyzed algorithm that multiplies integers 
in lineal - time on a RAM. In fact, that algorithm works on a very restricted type 
of machine called a “pointer machine,” which can be simulated in “real time” on 
a RAM with a very restricted instruction set (including the type in the previous 
exercise). That algorithm works with finite approximations to complex roots of 
unity, rather than roots of unity in a finite field. 

We close this section with a cute application of fast polynomial multiplication to 
the problem of factoring integers. 

Exercise 17.28. Let n be a large, positive integer. We can factor n using trial 
division in time w 1 / 2 +°<’ 1 )- however, using fast polynomial arithmetic in Z n [X], 
one can get a simple, deterministic, and rigorous algorithm that factors n in time 
«i/4 + °(i). Note that all of the factoring algorithms discussed in Chapter 15, while 
faster, are either probabilistic, or deterministic but heuristic. Assume that we can 
multiply polynomials in Z„[X] of length at most t using M(t) operations in Z,„ 
where M is a well-behaved complexity function, and M(l) = i 1 + "' 1 1 (the algo- 
rithm from Exercise 17.24 would suffice). 

(a) Let i be a positive integer, and for i = 1, . . . , t, let 

l-i 

n, := _ j ) mod n. 

j = 0 

Using fast polynomial arithmetic, show how to compute (a\, . . . , a/ ) in time 
£ 1+0(1) len(n) O(1) . 

(b) Using the result of part (a), show how to factor n in time n l / 4 +°4) us j n g a 
deterministic algorithm. 


17.7 Notes 

Reed-Solomon codes were first proposed by Reed and Solomon [81], although the 
decoder presented here was developed later. Theorem 17.8 was proved by Mills 
[68]. The Reed-Solomon code is just one way of detecting and correcting errors — 
we have barely scratched the surface of this subject. 

Just as in the case of integer arithmetic, the basic “pencil and paper” quadratic- 
time algorithms discussed in this chapter for polynomial arithmetic are not the best 
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possible. The fastest known algorithms for multiplication of polynomials of length 
at most ( over a ring R take 0(1 lcn(() len(len(())) operations in R. These algo- 
rithms arc all variations on the basic FFT algorithm (see Exercise 17.23), but work 
without assuming that 2r e R or that R contains any particular primitive roots 
of unity (we developed some of the ideas in Exercise 17.24). The Euclidean and 
extended Euclidean algorithms for polynomials over a field F can be implemented 
so as to take 0(1 lend' ) 2 lcn(lcn(f))) operations in F, as can the algorithms for 
Chinese remaindering and rational function reconstruction. See the book by von 
zur Gathen and Gerhard [39] for details (as well for an analysis of the Euclidean 
algorithm for polynomials over the field of rational numbers and over function 
fields). Depending on the setting and many implementation details, such asymptot- 
ically fast algorithms for multiplication and division can be significantly faster than 
the quadratic-time algorithms, even for quite moderately sized inputs of practical 
interest. However, the fast Euclidean algorithms are only useful for significantly 
larger inputs. 

Exercise 17.3 is based on an algorithm of Brent and Kung [20]. Using fast 
matrix and polynomial arithmetic, Brent and Kung show how to solve the modular 
composition problem using ()(( ( ° >+i) / 2 ) operations in R, where co is the exponent 
for matrix multiplication (see §14.6), and so (®+l)/2 < 1.7. Modular composition 
arises as a subproblem in a number of algorithms. f 


t Very recently, faster algorithms for modular composition have been discovered. See the papers by C. Umans 
[Fast polynomial factorization and modular composition in small characteristic, to appear in 40th Annual 
ACM Symposium on Theory of Computing, 2008] and K. Kedlaya and C. Umans [Fast modular composition in 
any characteristic, manuscript, April 2008], both of which are available at www . cs . caltech . edu/~umans/ 
research. 
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In this chapter, we develop some of the theory of linearly generated sequences. 
As an application, we develop an efficient algorithm for solving sparse systems 
of lineal - equations, such as those that arise in the subexponential-time algorithms 
for discrete logarithms and factoring in Chapter 15. These topics illustrate the 
beautiful inteiplay between the arithmetic of polynomials, linear algebra, and the 
use of randomization in the design of algorithms. 


18.1 Basic definitions and properties 

Let F be a field, let V be an F-vcctor space, and consider an infinite sequence 

*={«/}£ o 

where a, e V for / = 0,1,2 We say that T is linearly generated (over F) 

if there exist scalars co, . . . , c k -i e F such that the following recurrence relation 
holds: 

k - 1 

a k+i = ^ cjccj+i (for i = 0, 1, 2, . . .). 
j = 0 

In this case, all of the elements of the sequence T are determined by the initial 
segment ao, ■ ■ ■ , a k -i, together with the coefficients co, . . . , c k - i defining the recur- 
rence relation. 

The general problem we consider is this: how to determine the coefficients defin- 
ing such a recurrence relation, given a sufficiently long initial segment of X F. To 
study this problem, it turns out to be very useful to rephrase the problem slightly. 
Let g e F\ X ] be a polynomial of degree, say, k, and write g = 2j=o a i^ ■ Next, 
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define 

k 

g * ^ : = X a j a j- 

7=0 

Then it is clear that T is linearly generated if and only if there exists a non-zero 
polynomial g such that 

(X i g)*'¥ = 0 (for i = 0, 1,2,...). (18.1) 

Indeed, if there is such a non-zero polynomial g, then we can take 

c 0 := ~(a 0 /a k ), ci := ~{a\/a k ), ..., c k - i := -(a k -i/a k ) 

as coefficients defining the recurrence relation for T. We call a polynomial g sat- 
isfying (18.1) a generating polynomial for T. The sequence T will in general 
have many generating polynomials. Note that the zero polynomial is technically 
considered a generating polynomial, but is not a very interesting one. 

Let G('¥) be the set of all generating polynomials for T. 

Theorem 18.1. The set G(T) is an ideal of F[X ]. 

Proof. First, note that for all g, h e F[X], we have (g + /j)*'F = (g* x F) + (/!*'F) — 
this is clear from the definitions. It is also clear that for all c e F and g e F[X], 
we have (eg) * = c ■ (g ★ T). From these two observations, it follows that G(T) 

is closed under addition and scalar multiplication. It is also easy to see from the 
definition that GOP) is closed under multiplication by X; indeed, if (X'g) * = 0 

for all i > 0, then certainly, (X'(Xg)) ★ *P = (X' +1 g) * = 0 for all i > 0. But any 

non-empty subset of FfX] that is closed under addition, multiplication by elements 
of F, and multiplication by X is an ideal of F’fX] (see Exercise 7.27). □ 

Since all ideals of ,F[X] arc principal, it follows that G('¥) is the ideal of F\X\ 
generated by some polynomial 4> £ F[X ] — we can make this polynomial unique 
by choosing the monic associate (if it is non-zero), and we call this polynomial 
the minimal polynomial of X F. Thus, a polynomial g e F[X ] is a generating 
polynomial for T if and only if cp divides g; in particular, T is linearly generated if 
and only if f 0. 

We can now restate our main objective as follows: given a sufficiently long initial 
segment of a linearly generated sequence, determine its minimal polynomial. 

Example 18.1. One can always define a linearly generated sequence by simply 
choosing an initial segment a o, a\,. .., a/t-i, along with scalars co, . . . , c k - 1 £ F 
defining the recurrence relation. One can enumerate as many elements of the 
sequence as one wants by using storage for k elements of V, along with storage for 
the scalars c o, . . . , c k -\, as follows: 
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(Po, ■ ■ ■ , Pk-i) <- (ao, ...,a k -i) 

repeat 

output f)o 

P' - X'Co CjPj 

(A) ,Pk-l) (Pl> • • • , Pk-Lp') 

forever 

Because of the structure of the above algorithm, linearly generated sequences are 
sometimes also called shift register sequences. Also observe that if F is a finite 
field, and V is finite dimensional, the value stored in the “register” (A), . . . , At-t) 
must repeat at some point. It follows that the linearly generated sequence must be 
ultimately periodic (see definitions above Exercise 4.21). □ 

Example 18.2. Linearly generated sequences can also arise in a natural way, as this 
example and the next illustrate. Let E := F\X\/(f), where / e F\ X ] is a monic 
polynomial of degree ( > 0, and let a be an element of E. Consider the sequence 
T := {a'}“ 0 of powers of a. Lor every polynomial g = £*L 0 ajX J e .F[X], we 
have 

k 

g ^ aja J = g{a). 

j = o 

Now, if g(a ) = 0, then clearly (X'g) * = a’g(a) = 0 for all i > 0. Conversely, 

if ( X’g ) * = 0 for all i > 0, then in particular, g(a ) = 0. Thus, g is a generating 

polynomial for T if and only if g(a) = 0. It follows that the minimal polynomial 
4> of is the same as the minimal polynomial of a over F, as defined in §16.5. 
Lurthermore, r/j ^ 0, and the degree m of ([> may be characterized as the smallest 
positive integer m such that {»'}”i 0 is linearly dependent; moreover, as E has 
dimension l over F, we must have m < t. □ 


Example 18.3. Let V be a vector space over F of dimension l > 0, and let 
r : V — ► V be an L-linear map. Let P € V, and consider the sequence T := { a, } 
where a, = r '(P): that is, ao = p, a\ = t(P), a2 = t{t{P)), and so on. Lor every 
polynomial g = X^=o a i X* e -F[.X], we have 

k 

= ^ aj T J (P), 
j= 0 

and for every i > 0, we have 

k k 

(X’g) * Y = cijT l+ np) = T i ( Y j a J T l (P) ') = r'(g * ¥). 

j= 0 7=0 
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Thus, if g * = 0, then clearly (X'g) * *¥ = r'(g ★ T) = r'(0) = 0 for all i > 0. 

Conversely, if (X'g) * = 0 for all i > 0, then in particular, g ★ *P = 0. Thus, g is 

a generating polynomial for T if and only i f g * T = 0. The minimal polynomial <p 
of T is non-zero and its degree m is at most i ; indeed, m may be characterized as 
the least non-negative integer such that is linearly dependent, and since 

V has dimension t over F, we must have m < l. 

The previous example can be seen as a special case of this one, by taking V to 
be E, t to be the a-multiplication map on E, and setting /l to I . □ 

The problem of computing the minimal polynomial of a linearly generated 
sequence can always be solved by means of Gaussian elimination. For exam- 
ple, the minimal polynomial of the sequence discussed in Example 18.2 can be 
computed using the algorithm described in §17.2. The minimal polynomial of 
the sequence discussed in Example 18.3 can be computed in a si mi lar manner. 
Also, Exercise 18.3 below shows how one can reformulate another special case of 
the problem so that it is easily solved by Gaussian elimination. However, in the 
following sections, we will present algorithms for computing minimal polynomials 
for certain types of linearly generated sequences that arc much more efficient than 
any algorithm based on Gaussian elimination. 


Exercise 18.1. Show that the only sequence for which 1 is a generating polyno- 
mial is the “all zero” sequence. 

Exercise 18.2. Let *? = {a,-}“ 0 be a sequence of elements of an E-vector space 
V. Further, suppose that T has non-zero minimal polynomial <fi. 

(a) Show that for all polynomials g,h e F[X ], if g = h (mod <p ), then 
g * Y = h * Y. 

(b) Let m := deg(</>). Show that if g e E[X] and (X'g) * *? = 0 for all 

i = 0 , m — 1 , then g is a generating polynomial for T. 


Exercise 18.3. This exercise develops an alternative characterization of linearly 
generated sequences. Let T = { n } Jl () be a sequence of elements of F. Further, 
suppose that T has minimal polynomial (]> = XJ=o c i^' w 'th m > 0 and c m — I . 
Define the matrix 


A := 


( 20 

zi ■ ■ 

Zm— 

A 

Z\ 

Z2 ■ ■ 

Zm 


\Zm— 1 

Zm 

Z2m- 

J 


e F 


mxm 
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and the vector 

w := (Zm, ■ ■ ■ , Z2m-l) £ F ixm . 

Show that 

v = (-c 0 , . . . , —c m -\) e F lxm 
is the unique solution to the equation 

vA = w. 

Hint: show that the rows of A form a linearly independent family of vectors by 
making use of Exercise 18.2 and the fact that no polynomial of degree less than m 
is a generating polynomial for T. 

Exercise 18.4. Let co, . . . , c*_ i e F and zo , . . . , Zk-\ £ F. For each i > 0, let 

k - 1 

Zk+i '■= ^ c jZj+i- 
j = 0 

Given n > 0, along with Co, . . . , Ck-i and zo,---, Zk-i, show how to compute z n 
using 0(len(«)k 2 ) operations in F. 

Exercise 18.5. Let V be a vector space over F, and consider the set V xcc of all 
infinite sequences {cq}“ 0 , where the af s are in V . Let us define the scalar product 
of g c F[X] andTe F xo ° as 

g'¥ = {ft)*f)* 0 er. 

Show that with this scalar product, and addition defined component-wise, F xo ° is 
an F\ X ]-modulc, and that a polynomial g e F\ X ] is a generating polynomial for 
e V xco if and only if g ■ *P = 0. 


18.2 Computing minimal polynomials: a special case 

We now tackle the problem of efficiently computing the minimal polynomial of a 
linearly generated sequence from a sufficiently long initial segment. 

We shall first address a special case of this problem, namely, the case where the 
vector space V is just the field F . In this case, we have 

*=(*}£<>. 

where Zi e F for i = 0,1,2 

Suppose that we do not know the minimal polynomial <p of l k, but we know 
an upper bound M > 0 on its degree. Then it turns out that the initial segment 
Zo,Zi,--- Z.2M-1 completely determines <p, and moreover, we can very efficiently 
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compute <p given this initial segment. The following theorem provides the essential 
ingredient. 

Theorem 18.2. Let T = [Zi}°l 0 be a sequence of elements of F, and define the 
reversed Laurent series 

00 

z:= £z;Jr (,+1) e F((X-\ 

i = 0 

whose coefficients are the elements of the sequence T. Then for every g e L[X], 
we have g e G(T') if and only if gz 6 F\X\. In particular, T is linearly generated 
if and only if z is a rational function, in which case, its minimal polynomial is the 
denominator of z when expressed as a fraction in lowest terms. 

Proof. Observe that for every polynomial g e F\X\ and every integer i > 0, 
the coefficient of A _(,+1) j n the product gz is equal to X'g * Y — just look at the 
formulas defining these expressions ! It follows that g is a generating polynomial 
for T if and only if the coefficients of the negative powers of X in gz are all zero, 
which is the same as saying that gz e F\ X\. Further, i f g f 0 and h := gz e L[X], 
then deg (/z) < deg(g) — this follows simply from the fact that deg(z) < 0 (together 
with the fact that deg {h) = deg(g) + deg(z)). All the statements in the theorem 
follow immediately from these observations. □ 

By virtue of Theorem 18.2, we can compute the minimal polynomial cp of Y 
using the algorithm in §17.5.1 for computing the numerator and denominator of a 
rational function from its reversed Laurent series expansion. More precisely, we 
can compute cp given the bound M on its degree, along with the first 2 M elements 
Zo,---, Z 2 M —1 of T, using 0(M 2 ) operations in F. Just for completeness, we write 
down this algorithm: 

1 . Run the extended Euclidean algorithm on inputs 

/ := X 2M and h := zqX 2M ~ 1 + z\X 2M ~ 2 H 1- zim-\, 

and apply Theorem 17.8 with /, h, r* := M, and t* := M, to obtain the 
polynomials r' , s', t' . 

2. Output cp := t' /\c(t'). 


Exercise 18.6. Suppose F is a finite field and that T := {z,-}“ 0 is linearly gen- 
erated, with minimal polynomial cp. Further, suppose X \ cp. Show that T is purely 
periodic with period equal to the multiplicative order of[A]$ e ( F\X\/(f))* . Hint: 
use Exercise 17.12 and Theorem 18.2. 
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18.3 Computing minimal polynomials: a more general case 

Having dealt with the problem of finding the minimal polynomial of a linearly gen- 
erated sequence T, whose elements lie in F, we address the more general problem, 
where the elements of T lie in a vector space V over F. We shall only deal with a 
special case of this problem, but it is one which has useful applications: 

• First, we shall assume that V has finite dimension t > 0 over F. 

• Second, we shall assume that the sequence T = [a ,} “ 0 has full rank, by 
which we mean the following: if the minimal polynomial tp of over F 
has degree m, then {cq}™^ 1 is linearly independent. This property implies 
that the minimal polynomial of T is the monic polynomial <p e F\X\ of 
least degree such that $> ★ *? = 0. The sequences considered in Examples 
18.2 and 18.3 are of this type. 

• Third, we shall assume that F is a finite field. 

The dual space. Before presenting our algorithm for computing minimal polyno- 
mials, we need to discuss the dual space Dp(V) of V (over F), which consists 
of all T-linear maps from V into F. Thus, Dp(V) = Hom^fF, F), and is a 
vector space over F, with addition and scalar multiplication defined point-wise 
(see Theorem 13.12). We shall call elements of Dp(V) projections. 

Now, fix a basis S = {y,}f =1 for V. As was discussed in §14.2, every element 
5 6 V has a unique coordinate vector Vecs(<5) = ( c i ..... cy ) e F lx , where 
5 = Yjj c iYi ■ Moreover, the map Vecs : V —>■ F lxl is a vector space isomorphism. 

To each projection n e Dp(V) we may also associate the coordinate vector 
(k(Y\), .... n(Ye)Y e F txl . If V is the basis for F consisting of the single element 
I p, then the coordinate vector of n is Mat <,v//(7r), that is, the matrix of n relative 
to the bases S and V . By Theorem 14.4, the map Mats,?/- : F)p (V) —* F exl is a 
vector space isomorphism. 

In working with algorithms that compute with elements of V and D/ (K), we 
shall assume that such elements arc represented using coordinate vectors rela- 
tive to some convenient, fixed basis for V. If S e V has coordinate vector 
(cj, . . . , q) e F lx , and n e Dp(V) has coordinate vector (d \, . . . , d() T € F lxX , 

_ f 

then tr(5) is easily computed, using 0(1) operations in F, as 2 j/=i c /^/- 

We now return to the problem of computing the minimal polynomial <p of the 
linearly generated sequence T = {a/}“ 0 - Assume we have a bound M > 0 on the 
degree of cp. Since T has full rank and dini/-(F) = l, we may assume that M <1. 

For each n e Df(V), we may consider the projected sequence'?^ := {^(a,-)}” 0 . 
Observe that cp is a generating polynomial for indeed, for every polynomial 
g 6 TfX], we have g * = n(g * T*), and hence, for all i > 0, we have 

(X'cp) * = tr((X'cp) * T) = 7r(0) = 0. Let cp n e F[X] denote the minimal 
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polynomial of TV- Since <p K divides every generating polynomial of TV, and since 
4> is a generating polynomial for TV, it follows that (p K divides (j>. 

This suggests the following algorithm for efficiently computing the minimal 
polynomial of Th using the first 2 M terms of Tk 

Algorithm MP. Given the first 2 M terms of the sequence T = {a, }“ 0 , do the 
following: 

g^le F[X] 
repeat 

choose n e Dg( V ) at random 

compute the first 2 M terms of the projected sequence TV 

use the algorithm in §18.2 to compute the minimal polynomial 

4>n of TV 

g <- lcm(g, cp K ) 
until g * T* = 0 
output g 

A few remarks on the above procedure are in order: 

• in every iteration of the main loop, g is the least common multiple of a 
number of divisors of <p, and hence is itself a divisor of </;; in particular, 
deg(g) < M; 

• under our assumption that T has full rank, and since g is a monic divisor of 
$, if g * T* = 0, we may safely conclude that g = <p\ 

• under our assumption that F is finite, choosing a random element n of 
Dp( V ) amounts to simply choosing at random the entries of the coordinate 
vector of it. relative to some basis for V ; 

• we also assume that elements of V arc represented as coordinate vectors, 
so that applying a projection n € Dp(V) to an element of V takes 0(1) 
operations in i 7 ; in particular, in each loop iteration, we can compute the 
first 2 M terms of the projected sequence TV using O(Ml) operations in F\ 

• similarly, adding two elements of V, or multiplying an element of V by 
a scalar, takes 0(1) operations in F\ in particular, in each loop iteration, 
we can compute g * T* using 0( MV) operations in F (and using the first 
M + 1 < 2 M terms of X P). 

Based on the above observations, it follows that when the algorithm halts, its 
output is correct, and that the cost of each loop iteration is 0(MI) operations in 
F . The remaining question to be answered is this: what is the expected number of 
iterations of the main loop? The answer to this question is 0(1), which leads to a 
total expected cost of Algorithm MP of O(Ml) operations in F. 
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The key to establishing that the expected number of iterations of the main loop 
is constant is provided by the following theorem. 

Theorem 18.3. Let T = {a,-}“ 0 be a linearly generated sequence over the held 
F, where the a ,• ’s are elements of a vector space V of hnite dimension i > 0. Let 
4> be the minimal polynomial of T over F, let m := deg (</>), and assume that T 
has full rank (i.e., { a, } ™ q 1 is linearly independent). Finally, let F[X] <m denote the 
vector space over F consisting of all polynomials in L[X ] of degree less than m. 
Under the above assumptions, there exists a surjective F -linear map 

a : D f (V) -> F[X] 

<m 

such that for all n e Dp(V), the minimal polynomial cp K of the projected sequence 
:= {^(a,)}“ 0 satishes 

, _ 4> 

gcd (c{x),cp)' 

Proof. While the statement of this theorem looks a bit complicated, its proof is 
quite straightforward, given our characterization of linearly generated sequences 
in Theorem 18.2 in terms of rational functions. We build the linear map o as the 
composition of two linear maps, oq and oq . 

Let us define the map 

rro : F>f(V) F([X~ 1 )) 

00 

k i->- ^ K( y a i )X~ {,+ l) . 

/= 0 

We also define the map o\ to be the ^-multiplication map on F((X~ l )) — that is, the 
map that sends z e F((X~ 1 )) to <f ■ z e F((X~ 1 )). The map o is just the composition 
<7 — <j i o no. It is clear that both cro and oq are L-l incar maps, and hence, so is er. 

First, observe that for n e Dp{V), the series z := afn) is the series associ- 
ated with the projected sequence as in Theorem 18.2. Let 4> K be the minimal 
polynomial of TV Since <p is a generating polynomial for T, it is also a generating 
polynomial for TV Therefore, Theorem 18.2 tells us that 

h := o-(tt) = <p- ze F[X] <m , 

and that (p K is the denominator of z when expressed as a fraction in lowest terms. 
Now, we have z = h/4>, and it follows that = (p/ gcd(/i, </;) is this denominator. 

Second, the hypothesis that { «,■ } is linearly independent implies that 
dim /(I m <7o) > m (see Exercise 13.21). Also, observe that oq is an injective 
map. Therefore, dim /-(Im u) > m. In the previous paragraph, we observed 
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that lino- C F[X] <m , and since dim^ (_F[X] <m ) = m, we may conclude that 
Im a = F\X\ <m . That proves the theorem. □ 

Given the above theorem, we can analyze the expected number of iterations of 
the main loop of Algorithm MP. 

First of all, we may as well assume that the degree m of <p is greater than 0, 
as otherwise, we are sure to get (]> in the very first iteration. Let ji\ , . . . , n s be the 
random projections chosen in the first s iterations of Algorithm MP. By Theo- 
rem 18.3, each crfzr,) is uniformly distributed over F[X] <m , and we have g = tp at 
the end of loop iteration s if and only if gcd (tp, a{n \), . . . , ct( ji s )) = 1. 

Let us define A^(s) to be the probability that gcd(r/), f\ f s ) = 1, where 

f\ f s are randomly chosen from F[X] <m . Thus, the probability that we have 

g = p at the end of loop iteration s is equal to A^.(s). While one can ana- 
lyze the quantity A^.(.v), it turns out to be easier, and sufficient for our purposes, 
to analyze a different quantity. Let us define A™(s) to be the probability that 
gcd(/i, . . . , f s ) = 1, where fi,...,f s are randomly chosen from F\X\ <m . Clearly, 
A%) > A'p(s). 

Theorem 18.4. If F is a finite Held of cardinality q, and m and s are positive 
integers, then we have 

Ap(s) = 1 — 1/ q s ~ l + (q — 1)/ q sm . 

Proof. For each positive integer n, let U n denote the set of all tuples of polyno- 
mials (fi,...,f s ) e F[X]* s n with gcd(/j, . . . , f s ) = 1, and let u n := \U n \. Also, 
for each monic polynomial h e F\X\ of degree less that n, let U n j , denote the 
set of all 5-tuples of polynomials of degree less than n whose gcd is h. Observe 
that the set U n j, is in one-to-one correspondence with U„- k , where k := deg(fi), 
via the map that sends (/i,...,/ s ) e U luh to (fi/h,...,f s /h) e U n - k . As 
there are q k possible choices for h of degree k, if we define V n y to be the set 
of tuples (/i,...,/s) e F[X]* s n with deg(gcd(/i, . . . , f s )) = k, we see that 
\V n ,k\ = q k u„-k ■ Every non-zero tuple in appeal's in exactly one of the 

sets V n ,k, for k = 0 1. Taking into account the zero tuple, it follows that 

n— 1 

q sn =\ + Y j q k Un-k, (18.2) 

k=0 

which holds for all n > 1. Replacing n by n — 1 in (18.2), we obtain 

n — 2 

9 ,( "-l) = l + Y,q k un-t- k , 

k=0 


(18.3) 
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which holds for all n > 2, and indeed, holds for n = I as well. Subtracting q times 
(18.3) from (18.2), we deduce that for all n > 1, 


and rearranging terms: 


u n = q sn - q sn ~ s+l 


+ q — 1. 


Therefore, 

A %(s) = u m /q sm = 1 - 1 /q s ~ l + (q - l)/q sm . □ 

From the above theorem, it follows that for s > 1 , the probability P s that Algo- 
rithm MP runs for more than s loop iterations is at most 1 /q s ~ l . If L is the total 
number of loop iterations, then 

e m = 2 piL > /] = 1 + 2 1 + E w 1 = 1 + ^ 3 - 

i > 1 S>1 5>1 ^ 

Let us su mm arize all of the above analysis with the following: 


Theorem 18.5. Let T be a sequence of elements of an F -vector space V of finite 
dimension l > 0 over F, where F is a finite held. Assume that T is linearly 
generated over F with minimal polynomial <p £ L’l ] of degree m, and that T 
has full rank (i.e., the hrst m terms of T form a linearly independent family of 
elements). Then given an upper bound M > 0 on m, along with the hrst 2 M 
elements of T, Algorithm MP correctly computes <p using an expected number of 
O(Ml) operations in F. 


We close this section with the following observation. Suppose the sequence T 
is of the form {r'(/J)}“ 0 , where ft e V and r : V -> V is an L-lincar map. 
Suppose that with respect to some basis S for V, elements of V arc represented 
by their coordinate vectors (which arc elements of F lx( ), and elements of Dp(V) 
arc represented by their coordinate vectors (which are elements of F lxl ). The 
linear map r also has a corresponding matrix A = Mats,s(F, V) e F lxl: , so that 
evaluating r at a point a in V corresponds to multiplying the coordinate vector of 
a on the right by A. Now, suppose ft e V has coordinate vector v e F lx( and 
that k 6 Df(V) has coordinate vector w e F lx] . Then if T" is the sequence of 
coordinate vectors of the elements of T*, we have 

T" = {vA'}“ 0 and Y, = {vA'w}“ 0 . 

This more concrete, matrix-oriented point of view is sometimes useful; in partic- 
ular, it makes quite transparent the symmetry of the roles played by ft and n in 
forming the projected sequence. 
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Exercise 18.7. If |Tj = q and <p £ E|X| is monic and factors into monic irre- 
ducible polynomials in E[X] as tp = <p\' ■ ■ ■ 4> e r r , show that 

Aj(l) = f](l - q~ de ^) >1-J] <r degW,i) . 

i=t i=i 

From this, conclude that the probability that Algorithm MP terminates after just one 
loop iteration is 1 - 0(m/q), where m = dc g ( r/j ) . Thus, if q is very large relative 
to m, it is highly likely that Algorithm MP terminates after just one iteration of the 
main loop. 


18.4 Solving sparse linear systems 

Let V be a vector space of finite dimension 1 > 0 over a finite field F, and let 
r : V -> V be an E-linear map. The goal of this section is to develop time- and 
space-efficient algorithms for solving equations of the form 

T(r) = <5; (18.4) 

that is, given r and 5 e V, find / e V satisfying (18.4). The algorithms we 
develop will have the following properties: they will be probabilistic, and will 
use an expected number of 0(£ 2 ) operations in F, an expected number of 0(1) 
evaluations of r, and space for 0(1) elements of F . By an “evaluation of r,” we 
mean the computation of r(a) for a given a e V. 

We shall assume that elements of V arc represented as coordinate vectors with 
respect to some fixed basis for V. This means that a single element of V is repre- 
sented as a vector of £ elements of F . Now, if the matrix of t with respect to the 
given basis is sparse, having, say, t 1+0(1 ' non-zero entries, then the space required 
to represent r is £ 1+0(1 * elements of F, and the time required to evaluate r is £ 1 + " ( 1 1 
operations in F . Under these assumptions, our algorithms to solve (18.4) use an 
expected number of £ 2+0<] l operations in F, and space for £ 1 + " ( 1 * elements of F. 
This is to be compared with standard Gaussian elimination: even if the original 
matrix is sparse, during the execution of the algorithm, most of the entries in the 
matrix may eventually be “filled in” with non-zero field elements, leading to a run- 
ning time of Q(( 3 ) operations in F, and a space requirement of Fl(£ 2 ) elements of 
F. Thus, the algorithms presented here will be much more efficient than Gaussian 
elimination when the matrix of r is sparse. 

We hasten to point out that the algorithms presented here may be more efficient 
than Gaussian elimination in other cases, as well. All that matters is that r can 
be evaluated using o(£ 2 ) operations in F and/or represented using space for oil: 2 ) 
elements of F — in either case, we obtain a time and/or space improvement over 
Gaussian elimination. Indeed, there are applications where the matrix of the linear 
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map r may not be sparse, but nevertheless has special structure that allows it to be 
represented and evaluated in subquadratic time and/or space. 

We shall only present algorithms that work in two special, but important, cases: 

• the first case is where r is bijective, 

• the second case is where r is not bijective, 5 = 0, and a non-zero solution y 
to (18.4) is required (i.e., we arc looking for a non-zero element of Ker r). 

In both cases, the key will be to use Algorithm MP in §18.3 to find the minimal 
polynomial 4> of the linearly generated sequence 

¥:={«,}£, (o, := / = 0,1,...), (18.5) 

where P is a suitably chosen element of V. From the discussion in Example 18.3, 
this sequence has full rank, and so we may use Algorithm MP. We may use M := t 
as an upper bound on the degree of 4> (assuming we know nothing more about t and 
P that would allow us to use a smaller upper bound). In using Algorithm MP in this 
application, note that we do not want to store ao , . . . , ait-x — if we did, we would 
not satisfy our stated space bound. Instead of storing the af s in a "warehouse,” we 
use a “just in time” strategy for computing them, as follows: 

• In the body of the main loop of Algorithm MP, where we calculate the 
projections z,i := /r(n,), for / = ()... 2/ — I , we perform the computation as 
follows: 

a <- p 

for i <- 0 to 2£ - I do 

Zi *- tt(a), a <— r(a ) 

• In the test at the bottom of the main loop of Algorithm MP, if g = 
XjLo ajX' , we compute v := g * Y e V using the following Horner-like 
scheme: 

v <- 0 

for j <r- k down to 0 do 
v <r- r(v) + aj ■ P 

With this implementation. Algorithm MP uses an expected number of ()(( 2 ) oper- 
ations in F, an expected number of 0(1) evaluations of r, and space for 0(1) 
elements of F. Of course, the “warehouse” strategy is faster than the “just in time” 
strategy by a constant factor, but it uses about £ times as much space; thus, for large 
£, using the “just in time” strategy is a very good time/space trade-off. 

The bijective case. Now consider the case where r is bijective, and we want 
to solve (18.4) for a given S e V. We may as well assume that <5^0, since 
otherwise, y = 0 is the unique solution to (18.4). We proceed as follows. First, 
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using Algorithm MP as discussed above, compute the minimal polynomial cp of the 
sequence Y defined in (18.5), using ft := 8. Let cp = X^=o cyX- 7 ’, where c m = 1 and 
m > 0. Then we have 


c 0 <5 + cit(<5) + • • • + c,„T m (<5) = 0. (18.6) 

We claim that co ^ 0. To prove the claim, suppose that co = 0. Then applying 
r -1 to (18.6), we would obtain 

ci8 + ■ ■ ■ + c m T m -\8) = 0 , 

which would imply that cp/X is a generating polynomial for T, contradicting the 
minimality of <p. That proves the claim. 

Since co ^ 0, we can apply r~ ] to (18.6), and solve for y = r _1 (<5) as follows: 

y = -c~\ci8 + • • • + c m T m ~ l (S)). 

To actually compute y, we use the same “just in time” strategy as was used in the 
implementation of the computation of g * Y in Algorithm MP, which costs OU 2 ) 
operations in F, 0(1) evaluations of r, and space for 0(1) elements of F. 

The non-bijective case. Now consider the case where r is not bijective, and we 
want to find non-zero y e V such that r(y) = 0. The idea is this. Suppose we 
choose an arbitrary, non-zero element /? of V, and use Algorithm MP to compute 
the minimal polynomial (p of the sequence T defined in (18.5), using this value of 
(L Let (p = 2"L 0 CjX-i, where m > 0 and c m = 1. Then we have 

cop + c\t(P) + • • • + c m r m (P) = 0. (18.7) 

Let 

Y := c\P -\ + c m r m ~ l (P). 

We must have y ^ 0. since y = 0 would imply that [cp/X\ is a non-zero generating 
polynomial for T, contradicting the minimality of cp. If it happens that co = 0, 
then equation (18.7) implies that r(y) = 0, and we are done. As before, to actually 
compute y, we use the same “just in time” strategy as was used in the implementa- 
tion of the computation of g * T in Algorithm MP, which costs ()((: 2 ) operations 
in F, 0(1) evaluations of r, and space for 0(1) elements of F. 

The above approach fails if cq ^ 0. However, in this “bad” case, equation 
(18.7) implies that /? = —c~ l r(y)\ in particular, ft e Im r. One way to avoid such 
a “bad” f) is to randomize: as r is not surjective, the image of r is a subspace 
of V of dimension strictly less than 1, and therefore, a randomly chosen ft lies 
in the image of t with probability at most l/|Tj. So a simple technique is to 
choose repeatedly ft at random until we get a “good” /?. The overall complexity of 
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the resulting algorithm will be as required: Oil: 2 ) expected operations in F , 0(1) 
expected evaluations of t, and space for 0(1) elements of F. 

As a special case of this situation, consider the problem that arose in Chapter 15 
in connection with algorithms for computing discrete logarithms and factoring. We 
had to solve the following problem: given an t x (i — 1) matrix A with entries in 
a finite field F, containing £ 1 +o( 1 * non-zero entries, find non-zero v e F lxl such 
that vA = 0. To solve this problem, we can augment the matrix A, adding an extra 
column of zeros, to get an £ x £ matrix A'. Now, let V = F lxe and let r be the 
F-linear map on V that sends y e V to y A' . A non-zero solution y to the equation 
r(y) = 0 will provide us with the solution to our original problem; thus, we can 
apply the above technique directly, solving this problem using £ 2+o(] > expected 
operations in F, and space for £ l+o(l) elements of F. As a side remark, in this 
particular application, we can choose a “good” /I in the above algorithm without 
randomization: just choose /? := (0, 0, 1 ), which is clearly not in the image of r. 


18.5 Computing minimal polynomials in F[X]/(f) (II) 

Let us return to the problem discussed in §17.2: F is a field, / e F[X] is a monic 
polynomial of degree £ > 0, and E := F\X\/(f): we are given an element a e E, 
and want to compute the minimal polynomial cp £ F\X\ of a over F. As dis- 
cussed in Example 18.2, this problem is equivalent to the problem of computing 
the minimal polynomial of the sequence 

^ := {«i}“o («; : = i = 0 , 1 ,...), 

and the sequence has full rank; therefore, we can use Algorithm MP in §18.3 
directly to solve this problem, assuming F is a finite field. 

If we use the “just in time” strategy in the implementation of Algorithm MP, 
as was used in §18.4, we get an algorithm that computes the minimal polynomial 
of a using 0(1 3 ) expected operations in F , but space for just 0(£ 2 ) elements of 
F. Thus, in terms of space, this approach is far superior to the algorithm in §17.2, 
based on Gaussian elimination. In terms of time complexity, the algorithm based 
on linearly generated sequences is a bit slower than the one based on Gaussian 
elimination (but only by a constant factor). However, if we use any subquadratic- 
time algorithm for polynomial arithmetic (see §17.6 and §17.7), we immediately 
get an algorithm that runs in subcubic time, while still using linear space. In the 
exercises below, you are asked to develop an algorithm that computes the minimal 
polynomial of a using just 0(£ 2 ' 5 ) operations in F, at the expense of requiring 
space for 0(( 15 ) elements of F — this algorithm does not rely on fast polynomial 
arithmetic, and can be made even faster if such arithmetic is used. 
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Exercise 18.8. Let / e E[X] be a monic polynomial of degree £ > 0 over a field 
F, and let E := F[X]/{f). Also, let := [X]/ e E. For computational purposes, 
we assume that elements of E and Dp(E) arc represented as coordinate vectors 
with respect to the usual “polynomial” basis {if -1 Y j=v For f e E. let Mp denote 
the /1-multiplication map on E that sends a e E to a ft e E, which is an F-linear 
map from E into E. 

(a) Given as input the polynomial / defining E, along with a projection 
n e Df(E) and an element f e E, show how to compute the projection 
n o Mp e Dp(E), using 0{l 2 ) operations in F. 

(b) Given as input the polynomial / defining E , along with a projection 
n e E)f(E), an element a e E, and a parameter k > 0, show how to 
compute (tt(1 ), n (a), . . . , n:(a k ~ 1 )) using just 0(k£ + k l / 2 £ 2 ) operations in 
F , and space for 0(k ] / 2 t) elements of F . Hint: use the same hint as in 
Exercise 17.3. 

Exercise 18.9. Let / e E[X] be a monic polynomial over a finite field F of 
degree l > 0, and let E := F[X]/(f). Show how to use the result of the previous 
exercise, as well as Exercise 17.3, to get an algorithm that computes the minimal 
polynomial of a e E over F using 0(l 2 ' 5 ) expected operations in F, and space for 
0(1 1 - 5 ) operations in F. 

Exercise 18.10. Let / e E[X] be a monic polynomial of degree £ > 0 over 
a field F (not necessarily finite), and let E := F\X\/(f). Further, suppose that 
/ is irreducible, so that E is itself a field. Show how to compute the minimal 
polynomial of a e E over F deterministically, using algorithms that satisfy the 
following complexity bounds: 

(a) 0(£ 3 ) operations in F and space for ()(£) elements of F: 

(b) 0{£ 2 - 5 ) operations in F and space for 0{£ 1 - 5 ) elements of F. 


18.6 The algebra of linear transformations (*) 

Throughout this chapter, one could hear the whispers of the algebra of linear trans- 
formations. We develop some of the aspects of this theory here, leaving a number 
of details as exercises. It will not play a role in any material that follows, but it 
serves to provide the reader with a “bigger picture.” 

Let F be a field and V be an E-vector space. We denote by Cp(V) the set 
of all E-linear maps from V into V. Thus, Cp(V) = Hom^ (V, V), and is a 
vector space over F, with addition and scalar multiplication defined point-wise 
(see Theorem 13.12). Elements of Cp(V) are called linear transformations. 

For t, t' e Cp(V), the composed map, t o t', which sends a e V to 
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is also an element of Lp(V). As always, function composition is associative (i.e., 
for t,t t" e Cp(V), we have r o (F o r") = (t o t') o t"); however, function 
composition is not in general commutative (i.e., we may have r o r' ^ r' o t for 
some r, r' e £/-(U)). The following theorem considers the interaction between 
composition, addition, and scalar multiplication. 

Theorem 18.6. For all e Cp(V), and for all c e F, we have: 

(i) r o (r' + r") = t o r' + r o r"; 

(11) (r' + t") o r = t' o r + r" o t; 

(Hi) ( cr ) o F = c(t o t') = r o (ct'). 

Proof. Exercise. □ 

Under the addition operation and scalar multiplication of the vector space 
Cp(V), and defining multiplication on Cp(V) using the “o” operation, the pre- 
vious theorem implies that Cf(V) satisfies all the properties of an E-algebra (see 
Definition 16.1), except for the fact that multiplication is not commutative (the 
identity map acts as the multiplicative identity). Thus, we can think of Lp (V) as a 
non-commutative F -algebra. 

Let r e Lp(V) he a linear transformation. For each integer i > 0, the map 
t 1 (i.e., the /-fold composition of r) is also an element of Lp(V). Note that t° 
is by definition just the identity map on V . For each polynomial g e E[X], with 
g = JV ajX 1 , we denote by g( t) the linear transformation 

g(r) := eC F (V). 

i 

Thus, for a e V, the value of g(r) at a is JF «,£(«). 

Theorem 18.7. For all r e tlp(V), for all c € F, and for all g,h e F\X\, we 
have: 

(i) g(r) + h(r) = (g + h)(r); 

(ii) c ■ g(r) = ( cg)(r ); 

(iii) g( r) o h(r) = (gh)(r) = h(r) o g(r). 

Proof. Exercise. □ 

Let r e £f(F) be a linear transformation. We define 

F[ t] := {g(r):geF[X]}, 

which is a subset of Cp (V). By the previous theorem, it is clear that F[r] is closed 
under addition, multiplication (i.e., composition), and scalar multiplication, and 
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that F[t-] is in fact an F-algebra in the usual sense (i.e., multiplication is commu- 
tative). Moreover, the expressions F[r] and g{x) (for g e F\ X\) have the same 
meaning as in §16. 1. 

Let cj) T be the minimal polynomial of t over F, so that F\r\ is isomorphic as an 
F-algebra to F\X\/(cj) z ). We can also characterize <p z as follows: 

if there exists a non-zero polynomial g e F\X\ such that g(x) = 0, 
then (p T is the monic polynomial of least degree with this property; 
otherwise, cj) T = 0. 

Another way to characterize (j> T is as follows: 

4> r is the minimal polynomial of the sequence {r' }“ 0 . 

If V has finite dimension t > 0, then by Theorem 14.4, Cp(V) is isomorphic as 
an F-vcctor space to F lxl , and so in particular, has dimension l 2 . Therefore, there 
must be a linear dependence among 1, x , . . . , x fr , which implies that the minimal 
polynomial of x is non-zero with degree at most £ 2 (and at least 1). We shall show 
below that in this case, the minimal polynomial of r actually has degree at most l. 

For a fixed x e Cp(V), we can define a "scalar multiplication” operation 0, that 
maps g e F|X| and a e V to 

g@ a := g(x)(a) e V; 

that is, if g = cijX 1 , then 

g0« = ^ djx'ia). 
i 

Theorem 18.8. The scalai multiplication O, together with the usual addition oper- 
ation on V, makes V into an F[X]-module; that is, for all g,h e F\ X ] and 
a, p e V, we have 

g O (ho a) = ( gh ) © a, (g + h) © a = g O a + h O a, 
gQ{a + P) = gOa + g 0(1, 1 0 a = a. 


Proof. Exercise. □ 

Note that each choice of r gives rise to a different F\ X ]-module structure, but all 
of these structures arc extensions of the usual vector space structure, in the sense 
that for all c e F and a e V, we have c O a = ca. 

Now, for fixed x e L/AK) and a e V, consider the _F[X]-linear map p za : 
F[X] -> V that sends g e F[X] to gOa = g(x)(a). The kernel of this map must be 
a submodule, and hence an ideal, of F[X]; since every ideal of F\X\ is principal, 
it follows that Ker p T/l is the ideal of F\ X ] generated by some polynomial (p Ti a, 
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which we can make unique by insisting that it is monic or zero. We call <p TCC the 
minimal polynomial of a under r. We can also characterize cp Ta as follows: 

if there exists a non-zero polynomial g e F\X\ such that g(x)(a) = 

0, then 4>i,« the monic polynomial of least degree with this property; 
otherwise, (p TA = 0. 

Another way to characterize cp Ta is as follows: 

cp TA is the minimal polynomial of the sequence {-r'(a)}“ () . 

Note that since <p T {x) is the zero map, we have 

<p r O a = <p T {x){a) = 0, 

and hence <p T e Ker p TM , which means that cp ra \ <p T . 

Now consider the image of p Ta , which we shall denote by (a) T . As an L[X]- 
module, (a) T is isomorphic to F[X]/{(p T , a ). In particular, if <p Ta is non-zero and 
has degree m, then (a) T is a vector space of dimension m over F\ indeed, the 
elements a, r(a), . . . , r m ~ 1 (a) form a basis for (a) T over F: moreover, m is the 
smallest non-negative integer such that {t ! ( a)}”i 0 is linearly dependent. 

Observe that for every /I e (a) T , we have (p Ta 0/1 = 0; indeed, if /? = g © a, then 

<P r,a © (g © a) = (<fi T , a g) © a = g © ($ Ti „ © a) = g © 0 = 0. 

The following three theorems develop some simple facts; the proofs of these arc 
straightforward, and left as exercises. In each theorem, r is an element of tlf(V), 
and © is the associated scalar multiplication that makes V into an F\ X ]-modulc. 

Theorem 18.9. Let a e V have minimal polynomial f e L[X] under t, and 
let P e V have minimal polynomial g e F\ X ] under r. If gcd (f,g) = 1, then 
(a) T fl (P) T = {0}, and a + ft has minimal polynomial f ■ g under x. 

Theorem 18.10. Let a e V. Let f e F\X\ be a monic irreducible polynomial 
such that f e Oa = 0 but f e ~ l © a ± 0 for some integer e > 1. Then f e is the 
minimal polynomial of a under x. 

Theorem 18.11. Let a e V, and suppose that a has minimal polynomial f e F\X\ 
under x, with f ^ 0. Let g e F\X\. Then g © a has minimal polynomial 
f / gcd (/, g) under x. 

We arc now ready to state the main result of this section, whose statement and 
proof are analogous to that of Theorem 6.41: 

Theorem 18.12. Let x 6 Cp (V), and suppose that x has non-zero minimal poly- 
nomial (p. Then there exists ft e V such that the minimal polynomial of ft under x 
is cp. 
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Proof. Let © be the scalar multiplication associated with r. Let <p = cff ' ' ' <P7 be 
the factorization of <p into monic irreducible polynomials in F\X\. 

First, we claim that for each /' = 1, ,r, there exists a, e V such that 

< p/(pi © a j f 0. Suppose the claim were false: then for some we would have 
(p/cpi © a = 0 for all a e V ; however, this means that ((p/cpptr) = 0, contradicting 
the minimality property in the definition of the minimal polynomial cp. That proves 
the claim. 

Let a \, . . . , a r be as in the above claim. Then by Theorem 18.10, each (p / cp]' © 
has minimal polynomial c/r' under r. Finally, by Theorem 18.9, 

P := cp/<p\' © a\ + ■ ■ ■ + (p/<p c r r © a r 
has minimal polynomial <p under r. □ 

Theorem 18.12 says that if r has minimal polynomial cp of degree m > 0, then 
there exists f) e V such that { r‘ (ft ) } is linearly independent. From this, it 
immediately follows that: 

Theorem 18.13. If V has Unite dimension I > 0, then for every r € Cp(V), the 
minimal polynomial of r is non-zero of degree at most I. 

We close this section with a simple observation. Let V be an arbitrary FfX]- 
module with scalar multiplication 0. Restricting the scalar multiplication from 
TfX] to F, we can naturally view V as an F- vector space. Let r : V -> V be the 
map that sends a e V to X 0 a. It is easy to see that r 6 Cp(V), and that for all 
polynomials g e F[X], and all a e V, we have g O a = g(r)(a). Thus, instead of 
stalling with a vector space and defining an F[X] -module structure in terms of a 
given lineal - map, we can go the other direction, stalling from an F|X|-moduIe and 
obtaining a corresponding linear map. Furthermore, using the language introduced 
in Examples 13.19 and 13.20, we see that the F[X] -exponent of V is the ideal 
of F[X] generated by the minimal polynomial of r, and the F\ X | -order of any 
element a e V is the ideal of F\X\ generated by the minimal polynomial of a 
under t. Theorem 18.12 says that there exists an element in V whose F[X] -order 
is equal to the F[X] -exponent of V, assuming the latter is non-zero. 

So depending on one’s mood, one can place emphasis either on the linear map 
t, or just talk about F[X]-modules without mentioning any linear maps. 


Exercise 18.11. Let t e Cf(V) have non-zero minimal polynomial <p of degree 
m, and let cp = cp^' ■ ■ ■ (pf be the factorization of cp into monic irreducible poly- 
nomials in F[X], Let O be the scalar multiplication associated with r. Show 
that ft e V has minimal polynomial cp under r if and only if (ft /(ft, 0 ft f 0 for 
/ = l,...,r. 
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Exercise 18.12. Let r e Fp(V) have non-zero minimal polynomial cp. Show 
that t is bijective if and only if X j cp. 

Exercise 18.13. Let F be a finite field, and let V have finite dimension l > 0 
over F. Let r e Cp(V) have minimal polynomial cp, with dcg(Y/>) = m (and of 

course, by Theorem 18.13, we have m < £). Suppose that u\ a s arc randomly 

chosen elements of V. Let gj be the minimal polynomial of n / under r, for j = 

1,...,5. Let Q be the probability that lcm(gi g s ) = <p. The goal of this 

exercise is to show that Q > A^(s), where A^(s) is as defined in §18.3. 

(a) Using Theorem 18.12 and Theorem 18.11, show that if m = £, then Q = 
A *(s). 

(b) Without the assumption that m = £, things arc a bit more challenging. 
Adopting the matrix-oriented point of view discussed at the end of §18.3, 
and transposing everything, show that 

- there exists n e Dp(V) such that the sequence {n o has 

minimal polynomial cp, and 

- if, for j = 1 , . . . , s, we define hj to be the minimal polynomial of the 

sequence {^r(T'(a y ))}“ 0 , then the probability that lcm(/ji h s ) = 

<p is equal to A f.(s). 

(c) Show that hj | gj, for j = 1, . . . , s, and conclude that Q > A f.(s). 

Exercise 18.14. Let f,ge F[X| with / ^ 0, and let h := / / gcd(/,g). Show 
that g ■ F[X]/(f ) and F[X]/(h) ai - e isomorphic as F[A]-modules. 

Exercise 18.15. In this exercise, you arc to derive the fundamental theorem 
of finite dimensional F\ X ]-modules, which is completely analogous to the fun- 
damental theorem of finite abelian groups. Both of these results arc really special 
cases of a more general decomposition theorem for modules over a principal ideal 
domain. Let V be an F[A]-module. Assume that as an /•'-vector space, V has 
finite dimension ( > 0, and that the F\ X ]-cxponcnt of V is generated by the monic 
polynomial cp e F\X\ (note that 1 < deg (cp) < t). Show that there exist monic, 
non-constant polynomials (p\ cp, e F[X] such that 

• cp, | <p i+ 1 for i = 1 , , / —1 , and 

• V is isomorphic, as an F\ X ]- module, to the direct product of F[X] -modules 

V := F[X]/(cp\) x • ■ • x F[X]/(<p t ). 

Moreover, show that the polynomials <p\,...,cp, satisfying these conditions arc 
uniquely determined, and that cp, = cp. Hint: one can just mimic the proof of 
Theorem 6.45, where the exponent of a group corresponds to the F[X] -exponent of 
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an F\ X ]-module, and the order of a group element corresponds to the F[X] -order 
of an element of an F[X]-module — everything translates rather directly, with just 
a few minor, technical differences, and the previous exercise is useful in proving 
the uniqueness paid of the theorem. 


Exercise 18.16. Let us adopt the same assumptions and notation as in Exer- 
cise 18.15, and let r e £f(V) be the map that sends a e V to X © a. Further, 
let a : V -> V be the isomorphism of that exercise, and let r' e £f(V) be the 
X -multiplication map on V. 

(a) Show that a o r = r' o a. 

(b) From paid (a), derive the following: there exists a basis for V over F, with 
respect to which the matrix of r is the “block diagonal” matrix 


(Cx 


T = 


C 2 


\ 


CtJ 


where each C, is the companion matrix of </;, (see Example 14.1). 


Exercise 18.17. Let us adopt the same assumptions and notation as in Exer- 
cise 18.15. 

(a) Using the result of that exercise, show that V is isomorphic, as an F\X\- 
module, to a direct product of F[X ]-modules 

where the /,’s arc monic irreducible polynomials (not necessarily distinct) 
and the e, ’s arc positive integers, and this direct product is unique up to the 
order of the factors. 

(b) Using part (a), show that there exists a basis for V over F, with respect to 
which the matrix of r is the “block diagonal” matrix 

( c 'x \ 

C' 

T = 

V C'J 

where each C' is the companion matrix of f- ‘ . 

Exercise 18.18. Let us adopt the same assumptions and notation as in Exer- 
cise 18.15. 
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(a) Suppose a e V corresponds to ([gi]^ [gt\<j>,) € V under the isomor- 

phism of that exercise. Show that the F[X]-order of a is generated by the 
polynomial 

lcm($i/ gcd(gi, (pi), ...,$,/ gcd(g r , <p ,)). 

(b) Using part (a), give a short and simple proof of the result of Exercise 18.13. 


18.7 Notes 

Berlekamp [15] and Massey [64] discuss an algorithm for finding the minimal poly- 
nomial of a lineally generated sequence that is closely related to the one presented 
in §18.2, and which has a si mi lar complexity. This connection between Euclid’s 
algorithm and finding minimal polynomials of linearly generated sequences has 
been observed by many authors, including Mills [68], Welch and Scholtz [108], 
and Dornstetter [36]. 

The algorithm presented in §18.3 is due to Wiedemann [109], as are the algo- 
rithms for solving sparse linear systems in § 18.4, as well as the statement and proof 
outline of the result in Exercise 18.13. 

Our proof of Theorem 18.4 is based on an exposition by Morrison [69]. 

Using fast matrix and polynomial arithmetic, Shoup [96] shows how to imple- 
ment the algorithms in §18.5 so as to use just 0(t { " >+] ) / 2 ) operations in F, where 
co is the exponent for matrix multiplication (see §14.6), and so (® + l)/2 < 1.7.f 


t The running times of these algorithms can be improved using faster algorithms for modular composition — 
see footnote on p. 485. 
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This chapter develops some of the basic theory of finite fields. As we already know 
(see Theorem 7.7), every finite field must be of cardinality p w , for some prime p 
and positive integer w. The main results of this chapter are: 

• for every prime p and positive integer vv. there exists a finite field of cardi- 
nality p w , and 

• any two finite fields of the same cardinality are isomorphic. 


19.1 Preliminaries 

We begin by stating some simple but useful divisibility criteria for polynomials 
over an arbitrary field. These will play a crucial role in the development of the 
theory. 

Let F be a field. A polynomial / e F[X] is called square-free if it is not 
divisible by the square of any polynomial of degree greater than zero. Using formal 
derivatives (see §16.7), we obtain the following useful criterion for establishing 
that a polynomial is square-free: 

Theorem 19.1. If F is afield, and f e F[X] with gcd(/,D(/)) = 1, then f is 
square-free. 

Proof. Suppose / is not square-free, and write / = g 2 h, for g, h e F\ X ] with 
deg(g) > 0. Taking formal derivatives, we have 

D(/) = 2gD{g)h + g 2 D(h), 

and so clearly, g is a common divisor of / and D(/). □ 

Theorem 19.2. Let F be a held, and let k,l be positive integers. Then X k — I 
divides X 1 - I in F\X\ if and only if k divides I. 
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Proof. Let l = kq + r, with 0 < r < k. We have 

X 1 = X kq X r = X r (mod X k - 1), 
and X r = 1 (mod — 1) if and only if r = 0. □ 

Theorem 19.3. Let a > 2 be an integer and let k, £ be positive integers. Then 
a k — 1 divides a 1 — I if and only if k divides £. 

Proof. The proof is analogous to that of Theorem 19.2. We leave the details to the 
reader. □ 

One may combine these last two theorems, obtaining: 

Theorem 19.4. Let a > 2 be an integer, k,£ be positive integers, and F a field. 
Then X° k - X divides X a " - X in L[X | if and only if k divides t. 

Proof. Now, X“ k - X divides X“' - X if and only if X“ k ~ x - 1 divides X a ‘~ x - 1. By 
Theorem 19.2, this happens if and only if a k — 1 divides a f — 1. By Theorem 19.3, 
this happens if and only if k divides l. □ 

We end this section by recalling some concepts discussed earlier, mainly in 
§ 16. 1, § 16.5, and § 16.6, that will play an important role in this chapter. 

Suppose F is a field, and E is an extension field of L; that is, F is a subfield 
of E (or, more generally, F is embedded in E via some canonical embedding, and 
we identify elements of F with their images in E under this embedding). We may 
view E as an L-algcbra via inclusion, and in particular, as an L-vcctor space. If 
E’ is also an extension field of F , and p : E -» E' is a ring homomorphism, then p 
is an L-algcbra homomorphism if and only if p(a) = a for all a e F. 

Let us further assume that as an L-vector space, E has finite dimension l. This 
dimension t is called the degree of E over F, and is denoted (L : L), and E is 
called a finite extension of F. Now consider an element a e E. Then a is algebraic 
over F, which means that there exists a non-zero polynomial g g L[L] such that 
g{a ) = 0. The monic polynomial tfi e L[X] of least degree such that </>(«) = 0 
is called the minimal polynomial of a over F. The polynomial <p is irreducible 
over L, and its degree m := deg (</>) is called the degree of a over L. The ring 
L[a] = (g(a) : g e L[X]}, which is the smallest subring of E containing F and 
a, is actually a field, and is isomorphic, as an L-algebra, to F\X\ /(([)), via the map 
that sends g(a ) g L[a] to [ g ]^, e F[X]/(<fi). In particular, (L[or] : L) = m, and the 
elements 1 ,a a' n ~ x form a basis for L[a] over F. Moreover, m divides t. 
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19.2 The existence of finite fields 

Let F be a finite field. As we saw in Theorem 7.7, F must have cardinality p w , 
where p is prime and w is a positive integer, and p is the characteristic of F. How- 
ever, we can say a bit more than this. As discussed in Example 7.53, the field 7L P 
is embedded in F, and so we may simply view 7L P as a subfield of F. Moreover, it 
must be the case that w is equal to ( F : TL P ). 

We want to show that there exist finite fields of every prime -power cardinality. 
Actually, we shall prove a more general result: 

If F is a finite field, then for every integer £> l, there exists an 
extension field E of degree £ over F. 

For the remainder of this section, F denotes a finite field of cardinality q = p w , 
where p is prime and w > 1 . 

Suppose for the moment that E is an extension of degree l over F. Let us 
derive some basic facts about E. First, observe that E has cardinality q 1 . By Theo- 
rem 7.29, E* is cyclic, and the order of E* is q e — 1. If y e E* is a generator for E*, 
then every non-zero element of E can be expressed as a power of y\ in particular, 
every element of E can be expressed as a polynomial in y with coefficients in F ; 
that is, E = F[y ]. Let (p e F[X] be the minimal polynomial of y over F, which 
is an irreducible polynomial of degree £. It follows that E is isomorphic (as an 
F-algebra) to F[X]/(<fi). 

So we have shown that every extension of degree £ over F must be isomorphic, 
as an F-algcbra, to F\X\/(f) for some irreducible polynomial / e F\X\ of degree 
£. Conversely, given any irreducible polynomial / over F of degree £, we can 
construct the finite field F[X]/(f), which has degree £ over F. Thus, the question 
of the existence of a finite field of degree £ over F reduces to the question of the 
existence of an irreducible polynomial over F of degree £. 

We begin with a simple generalization of Fermat’s little theorem: 

Theorem 19.5. For every a e F, we have a q = a. 

Proof. The multiplicative group of units F* of F has order q— 1 , and hence, every 
a e F* satisfies the equation a q ~ l = 1. Multiplying this equation by a yields 
a q = a for all a e F*, and this latter equation obviously holds for a = 0 as well. □ 

This simple fact has a number of consequences. 

Theorem 19.6. We have 

X q - X = - a). 

aeF 

Proof. Since each aeF is a root of X q — X , by Theorem 7.13, the polynomial 
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ru,a - a) divides the polynomial X q — X. Since the degrees and leading coeffi- 
cients of these two polynomials arc the same, the two polynomials must be equal. □ 

Theorem 19 . 7 . Let E be an F -algebra. Then the map o : E E that sends 
a e E to a q is an F -algebra homomorphism. 

Proof. By Theorem 16.3, either E is trivial or contains an isomorphic copy of F as 
a subring. In the former case, there is nothing to prove. So assume that E contains 
an isomorphic copy of F as a subring. It follows that E must have characteristic p. 

Since q = p w , we see that o = r w , where -r(a) := a p . By the discussion in 
Example 7.48, the map r is a ring homomorphism, and hence so is a. Moreover, 
by Theorem 19.5, we have 

o{c\e) = {cIeY = c q l q E = c\e 

for all c e F. Thus (see Theorem 16.5), o is an L-algcbra homomorphism. □ 

The map a defined in Theorem 19.7 is called the Frobenius map on E over F. 
In the case where E is a finite field, we can say more about it: 

Theorem 19.8. Let E be a finite extension of F, and let o be the Frobenius map 
on E over F. Then o is an F -algebra automoiphism on E. Moreover, for all 
a e E, we have o(a) = a if and only if a e F. 

Proof. The fact that a is an L-algcbra homomorphism follows from the previous 
theorem. Any ring homomorphism from a field into a field is injective (see Exer- 
cise 7.47). Surjectivity follows from injectivity and finiteness. 

For the second statement, observe that o(a) = a if and only if a is a root of 
the polynomial X q — X, and since all q elements of F arc already roots, by Theo- 
rem 7.14, there can be no other roots. □ 

As the Frobenius map on finite fields plays a fundamental role in the study of 
finite fields, let us develop a few simple properties right away. Suppose E is a finite 
extension of L, and let a be the Frobenius map on E over F. Since the composition 
of two L-algebra automorphisms is also an L-algebra automorphism, for every 
i > 0, the /-fold composition o’, which sends a e E to a q ‘ e E , is also an L-algebra 
automorphism. Since a is an L-algebra automorphism, the inverse function o~ l is 
also an L-algebra automorphism. Hence, o' is an L-algebra automorphism for all 
i e Z. If L has degree l over L, then applying Theorem 19.5 to the field L, we see 
that o l is the identity map. More generally, we have: 

Theorem 19 . 9 . Let E be a extension of degree £ over L, and let o be the Frobenius 
map on E over F. Then for all integers i and j, we have o' = o' if and only if 
i = j (mod £). 
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Proof. We may assume i > j. We have 

o' = afi <=> (j'~j = (j° <=> a q ' - a = 0 for all a e E 

<=> ( JJ(X - a)) | ( X“‘~ J - X ) (by Theorem 7.13) 

aeE 

<=> {X q ' — X) | ( X q ‘ 1 — X) (by Theorem 19.6, applied to E) 

£ | (/ — j) (by Theorem 19.4) 
i = j (mod i). □ 

From the above theorem, it follows that every power of the Frobenius map a can 
be written uniquely as o' for some / = 0 i — 1 . 

The following theorem generalizes Theorem 19.6: 

Theorem 19.10. For k > 1, let P k denote the product of all the monic irreducible 
polynomials in F\ X ] of degree k. For all positive integers l, we have 

x qt -x = Y[p k , 

k\t 

where the product is overall positive divisors k of £. 

£ 

Proof. First, we claim that the polynomial X q - X is square -free. This follows 
immediately from Theorem 19.1, since \FX q ' - X) = q f X q ‘~ { — 1 = -1. 

Thus, we have reduced the proof to showing that if / is a monic irreducible 

£ 

polynomial of degree k, then / divides X q - X if and only if k divides £. 

So let / be a monic irreducible polynomial of degree k. Let E := F[X]/(f ) = 
P[|], where <5 := [X] f e E. Observe that E is an extension field of degree k over 
F. Let <7 be the Frobenius map on E over F. 

First, we claim that / divides X q — X if and only if ofZ) = c. Indeed, / is the 

£ 

minimal polynomial of f over F, and so / divides X q - X if and only if ^ is a root 
of X q — X. which is the same as saying % q = c, or equivalently, o (f ) = £,. 

Second, we claim that c/(£) = if and only if o f {a) = a for all a e E. To 

see this, first suppose that c \a) = a for all a e E. Then in particular, this holds 
for a = c. Conversely, suppose that o f (£) = c. Every a e E can be written as 
a = g(c) for some g e F\ X ], and since o 1 is an P-algebra homomorphism, by 
Theorem 16.7 we have 

o\a) = o e (g(0) = g(cj e ({)) = g(|) = a. 

Finally, we see that o l ( a) = a for all a e E if and only if o 1 = a 0 , which by 

Theorem 19.9 holds if and only if k \ i. □ 
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For t > 1 , let 11/41) denote the number of monic irreducible polynomials of 
degree £ in _F[X]. 

Theorem 19.11. For all £ > 1, we have 

q e = ^kU F (k). (19.1) 

k\t 

Proof. Just equate the degrees of both sides of the identity in Theorem 19. 10. □ 

From Theorem 19. 1 1 it is easy to deduce that Ilf (1) > 0 for all £, and in fact, one 
can prove a density result — essentially a “prime number theorem” for polynomials 
over finite fields: 

Theorem 19.12. For all £ > 1, we have 

i i 

fj < U F (l) < q j , (19.2) 

and 

t , 1 / 2 . 

n F(0 = y + O(y)- (19.3) 

Proof. First, since all the terms in the sum on the right hand side of (19.1) arc 
non-negative, and £Tl F {£) is one of these terms, we may deduce that £T\ F {£) < q, 
which proves the second inequality in (19.2). Since this holds for all £, we have 

F/2J 

£U f {£) = q e - £ kU F {k) >q f -J j q k >q f - ^ q k . 

k\{ k\( k = 1 

k<t k<( 

Let us set 

F/2J 

s(<i,i)-= y ./* = - 1 ), 

tx «-> 

so that £H F (£) > q l — S{q, £). It is easy to see that S{q, £) = 0(q { / 2 ), which proves 
(19.3). For the first inequality of (19.2), it suffices to show that S(q ,£ ) < q e / 2. 
One can verify this directly for ( e { 1, 2, 3} , and for ( > 4, we have 

S(q,£) < q e/2+l < q e ~ l < q l / 2. □ 

We note that the inequalities in (19.2) are tight, in the sense that II F {£) = q l / 2£ 
when q = 2 and 1 = 2 , and 14 F (£) = q f when 1=1. The first inequality in 
(19.2) implies not only that Ilf (!) > 0, but that the fraction of all monic degree £ 
polynomials that are irreducible is at least 1 / 21, while (19.3) says that this fraction 
gets arbitrarily close to 1 /l as either q or £ are sufficiently large. 
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Exercise 19.1. Starting from Theorem 19.11, show that 

n F {t) = r 1 2 mq e/k , 

k\l 

where p is the Mobius function (see §2.9). 

Exercise 19.2. How many irreducible polynomials of degree 30 over Z2 arc 
there? 


19.3 The subfield structure and uniqueness of finite fields 

Let E be an extension of degree t over a field F. If K is an intermediate field, 
that is, a subfield of E containing F, then Theorem 16.23 says that (E : F) = 
( E : K)(K : F), and so in particular, the degree of K over F divides £. 

In the case where F is a finite field, we can say much more about such inter- 
mediate fields. Recall that if p : E -» E be an E-algebra homomorphism, then 
the subalgebra of E fixed by p is defined as K := [a e E : p(a ) = a } (see 
Theorem 16.6). Not only is K a subalgebra of E, but it is also a field, and so K is 
itself an intermediate field. 

Theorem 19.13. Let E be an extension of degree £ over a finite field F. Let o be 
the Frobenius map on E over F. Then the intermediate fields K, with F C K C E, 
are in one-to-one correspondence with the divisors k of £, where the divisor k 
corresponds to the subalgebra of E fixed by a k , which has degree k over F. 

Proof. Let q be the cardinality of F. 

Suppose k is a divisor of £. By Theorem 19.6 (applied to E), the polynomial 

£ 

X q — X splits into distinct monic linear factors over E. By Theorem 19.4, the 

k l k 

polynomial X q — X divides X q — X. Hence, X q — X also splits into distinct 
monic linear factors over E. This says that the subalgebra of E fixed by ct k , which 
consists of the roots of X q - X , has precisely q k elements, and hence is an extension 
of degree k over F. 

Now let K be an arbitrary intermediate field, and let k be the degree of K over 
F. As already mentioned, we must have k \ £. Also, by Theorem 19.8 (applied 
with K in place of F), K is the subalgebra of E fixed by ct k . □ 

The next theorem shows that up to isomorphism, there is only one finite field of 
a given cardinality. 

Theorem 19.14. Let E and E' be finite extensions of the same degree over a finite 
field F. Then E and E' are isomorphic as F -algebras. 

Proof. Let q be the cardinality of F, and let £ be the degree of the extensions. 
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As we have argued before, we have E' = F[a'\ for some a' e E', and so E' is 
isomorphic as an F-algebra to F\X\/(cp), where <fi is the minimal polynomial of 
a' over F. As cp is an irreducible polynomial of degree l, by Theorem 19.10, <p 
divides X ql — X, and by Theorem 19.6 (applied to E), X q ' — X = n ae£ (X - a), 
from which it follows that <p has a root a e E. Since <fi is irreducible, <fi is the 
minimal polynomial of a over F, and hence F\a\ is isomorphic as an F-algebra 
to F[X]/(<p). Since a has degree l over F, we must have E = F[a\. Thus, 
E = F[a] = F[X]/((p) = F[a'] = E ’ . □ 


Exercise 19.3. This exercise develops an alternative proof for the existence of 
finite fields — however, it does not yield a density result for irreducible polynomi- 
als. Let F be a finite field of cardinality q, and let l > 1 be an integer. Let E be 

£ 

a splitting field for the polynomial X q - X e F\X\ (see Theorem 16.25), and let 
o' be the Lrobenius map on E over F. Let K be the subalgebra of E fixed by a 1 . 
Show that K is an extension of F of degree t. 

Exercise 19.4. Let E be an extension of degree l over a finite field F of cardi- 
nality q. Show that at least half the elements of E have degree l over F, and that 
the total number of elements of degree t over F is q 1 + Olq 1 / 2 ). 

Exercise 19.5. Let £ be a finite extension of a finite field F, and suppose 
a, p e E, where a has degree a over F, p has degree b over F, and gcd(o, b) = 1. 
Show that p has degree b over F|«|. that a has degree a over F[fi], and that a+fi has 
degree ab over F. Hint: consider the subfields F[a], F[fi], F[a][fi] = F[a, ft] = 
F[P][a], and F[a + /?], and their degrees over F. 


19.4 Conjugates, norms and traces 

Throughout this section, F denotes a finite field of cardinality q, E denotes an 
extension of degree i over F, and a denotes the Lrobenius map on E over F. 

Consider an element a e E. We say that /I e £ is conjugate to a (over F) 
if = a' (a) for some i e Z. The reader may verify that the “conjugate to” 
relation is an equivalence relation. We call the equivalence classes of this relation 
conjugacy classes, and we call the elements of the conjugacy class containing a 
the conjugates of a. 

Stalling with a, we can staid listing conjugates: 

a, a (a), <r'(a), 

As a 1 is the identity map, this list will eventually staid repeating. Let k be the 
smallest positive integer such that a k {a) = ct‘ (a) for some i = 0, . . . , k — 1. It must 
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be the case that /' = 0 — otherwise, applying to the equation o k (a) = o' {a) 
would yield a k ~ l (a) = a ,-1 (a), and since 0 < i — 1 < k — 1, this would contradict 
the minimality of k. 

Thus, a, o(a), . . . , o k ~ l (a) are all distinct, and o k (a) = a. Moreover, for every 
integer i, we have cr‘(a) = o' (a), where j = i mod k. Therefore, the k distinct 
elements a, cr(a ), . . . , o k ~ l (a) are all the conjugates of a. Also, o' (a) = a if and 
only if k divides i, and since <r (a) = a, it must be the case that k divides l. In 
addition, the conjugates of a are powers of a, and in particular, they all belong to 
F[a]. 

With a and k as above, consider the polynomial 

k - 1 

0 := Y[(X ~ *'(«))• 

i =0 

The coefficients of <p obviously lie in E, but we claim that in fact, they lie in F. 
This is easily seen as follows. Extend the domain of definition of a from E to E\X\ 
by applying a coefficient- wise to polynomials; this yields a ring homomorphism 
from E[X] into E[X], which we also denote by a (see Example 7.46). Applying o 
to tp, we obtain 

k - 1 k - 1 k-l 

*(0) = Y[v(X - o\a)) = P](X - o i+1 (a )) = - o\a)), 

1=0 /= 0 1=0 

since o k {a) = a. Thus we see that o{tp) = ip. Writing tp = JE CjX 1 , it follows 
that tr(Ci) = d for all i, and hence by Theorem 19.8, c, e F for all i. Hence 
<p e F\ X\. We further claim that tp is the minimal polynomial of a. To see this, let 
/ e F\X\ be any polynomial over F for which a is a root. Then for every integer 
i, by Theorem 16.7, we have 

0 = 0) = a\f{a)) = f(o\a)). 

Thus, all the conjugates of a are also roots of /, and so <p divides /. That proves 
that <p is the minimal polynomial of a. Since tp is the minimal polynomial of a and 
deg (tp) = k, it follows that the number /< is none other than the degree of a over F. 
Let us su mm arize the above discussion as follows: 

Theorem 19.15. Let a e E be of degree k over F, and let tp be the minimal poly- 
nomial of a over F . Then k is the smallest positive integer such that o k (a) = a, 
the distinct conjugates of a are a, o(a ), . . . , o k ~ l (a), and cp factors over E (in fact, 
over F[a\) as 

k-l 

0 = l[(X - o '(«)). 

(=0 
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Another useful way of reasoning about conjugates is as follows. First, if a = 0, 
then the degree of a over F is 1 , and there is nothing more to say, so let us assume 
that a 6 E*. If r is the multiplicative order of a, then note that every conjugate 
rr'(a) also has multiplicative order r — this follows from the fact that for every 
positive integer s, a s = 1 if and only if (a'(a)) s = 1. Also, note that we must have 
r | | E*\ = q l — 1, or equivalently, q f = 1 (mod r). Focusing now on the fact that 
a is the g-power map, we see that the degree k of a is the smallest positive integer 
such that a q = a, which holds if and only if a q -1 = 1, which holds if and only if 
q k = 1 (mod r). Thus, the degree of a over F is simply the multiplicative order of 
q modulo r. Again, we summarize these observations as a theorem: 

Theorem 19.16. If a e E* has multiplicative order r, then the degree of a over 
F is equal to the multiplicative order of q modulo r. 

For a e E, define the polynomial 

t - 1 

X : = - er'(a))- 

1=0 

It is easy to see, using the same type of argument as was used to prove Theo- 
rem 19.15, that x e -F[9f], and indeed, that 

x = <P e/k , 

where k is the degree of a over F. The polynomial / is called the characteristic 
polynomial of a (from E to F). 

Two functions that arc often useful arc the “norm” and “trace.” The norm of a 
(from E to F) is defined as 

l - 1 

N e/f(o) '■= 

/=o 

while the trace of a (from E to F ) is defined as 

e - 1 

Tr £ /_F(a) := ^ a' (a). 

1=0 

It is easy to see that both the norm and trace of a are elements of F, as they arc 
fixed by n; alternatively, one can see this by observing that they appeal - , possibly 
with a minus sign, as coefficients of the characteristic polynomial / — indeed, the 
constant term of / is equal to (-lj^N^/ffa), and the coefficient of X e ~ l in / is 
-Tr e /f{o). 

The following two theorems summarize the most important facts about the norm 
and trace functions. 
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Theorem 19.17. The function N e/f, restricted to E* , is a group homomorphism 
from E* onto F*. 

Proof. We have 

N_E/_F(a) = 0c q =«£/=(> 9 - a ( q -l)/( 9-D_ 

1=0 

Since E* is a cyclic group of order q — 1 , the image of the (q 1 - 1 )/{q — l)-power 
map on E* is the unique subgroup of E* of order q — I (see Theorem 6.32). Since 
F* is a subgroup of E* of order q — 1 , it follows that the image of this power map 
is F*. □ 

Theorem 19.18. The function Tr e/f is an F -linear map from E onto F . 

Proof. The fact that Tr e/f is an F-linear map is a simple consequence of the fact 
that o is an .F-linear map. As discussed above, Tr e/f maps into F. Since the 
image of Tr e/f is a subspace of F , the image is either {0} or F, and so it suffices 
to show that Tr e/f does not map all of E to zero. But an element a e E is in the 
kernel of Tr ^ /r if and only if a is a root of the polynomial 

x + x q + --- + x qf ~ l , 

which has degree q e ~ l . Since E contains q f elements, not all elements of E can lie 
in the kernel of Tr^ /p. □ 

Example 19.1. As an application of some of the above theory, let us investigate the 
factorization of the polynomial X r — 1 over F, a finite field of cardinality q. Let us 
assume that r > 0 and is relatively prime to q. Let £ be a splitting field of X r — 1 
(see Theorem 16.25), so that £ is a finite extension of F in which X r — I splits into 
linear factors: 

r 

r - 1 = II(* - «<)■ 

(=1 

We claim that the roots a, of X r — 1 are distinct — this follows from the Theo- 
rem 19.1 and the fact that gcd(A'' - 1, rX r ~ l ) = 1. 

Next, observe that the r roots of X r — 1 in E actually form a subgroup of E*, 
and since E* is cyclic, this subgroup must be cyclic as well. So the roots of X r — 1 
form a cyclic subgroup of E* of order r. Let C be a generator for this group. Then 
all the roots of X r — 1 are contained in F\C\, and so we may as well assume that 
E = F[Q. 

Let us compute the degree of C over F. By Theorem 19.16, the degree l of 
over F is the multiplicative order of q modulo r. Moreover, the (p(r) roots of 
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X r - 1 of multiplicative order r are partitioned into <p(r)/£ conjugacy classes, each 
of size i (here, cp is Euler’s phi function); indeed, as the reader is urged to verify, 
these conjugacy classes arc in one-to-one correspondence with the cosets of the 
subgroup of Z* generated by [q] r , where each such coset C C Z* corresponds to 
the conjugacy class {£ a : [ a] r e C}. 

More generally, for every s \ r, every root of X r — 1 whose multiplicative 
order is s has degree k over F, where k is the multiplicative order of q modulo 
s. As above, the <p(s) roots of multiplicative order s are partitioned into cp(s)/k 
conjugacy classes, which are in one-to-one correspondence with the cosets of the 
subgroup of Z* generated by \q\ s . 

This tells us exactly how X r — 1 splits into irreducible factors over F. Things 
are a bit simpler when r is prime, in which case, from the above discussion, we see 
that 

(r- \)/l 

X r -\ = {X-\) JJ f h 

i = 1 

where the /,•’ s are distinct monic irreducible polynomials, each of degree l, and l 
is the multiplicative order of q modulo r. 

In the above analysis, instead of constructing the field E using Theorem 16.25, 
one could instead simply construct E as F\X\/(f), where / is any irreducible 
polynomial of degree L and where £ is the multiplicative order of q modulo r. 
We know that such a polynomial / exists by Theorem 19.12, and since E has 
cardinality q 1 , and r \ (q — 1) = |L*|, and E* is cyclic, we know that E* contains an 
element £ of multiplicative order r, and each of the r distinct powers 1 
arc roots of X r — 1, and so this E is a splitting field of X r — 1 over F. □ 

Exercise 19.6. Let E be an extension of degree £ over a finite field F. Show 
that for a e F, we have N e/f(o) = a 1 and Tr e/f(o) = la. 

Exercise 19.7. Let £ be a finite extension of a finite field F. Let K be an 
intermediate field, F C K C E. Show that for all a e E 

(a) Ne /f(o) = Njc/jr(N£/A;(a)), and 

(b) Tr E / F (a) = TrE/E(Tr £/ E(a)). 

Exercise 19.8. Let F be a finite field, and let / e E[X] be a monic irreducible 
polynomial of degree £. Let E = F[X\/(f ) = _F[|], where | := [X] /. 

(a) Show that 

^ £ Tr e/f (Z j - 1 )X-F 

J j = i 
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(b) From part (a), deduce that the sequence of elements 

Tr U = 1,2,...) 

is linearly generated over F with minimal polynomial /. 

(c) Show that one can always choose a polynomial / so that sequence in paid 
(b) is purely periodic with period q — 1 . 

Exercise 19.9. Let F be a finite field, and / e FTX] a monic irreducible poly- 
nomial of degree k over F. Let E be an extension of degree i over F . Show that 
over E, f factors as the product of d distinct monic irreducible polynomials, each 
of degree k/d, where d := gcd (k,£). 

Exercise 19.10. Let £ be a finite extension of a finite field F of characteristic 
p. Show that if a e E and 0 ^ a e F, and if a and a + a are conjugate over F , then 
p divides the degree of a over F. 

Exercise 19.1 1. Let F be a finite field of characteristic p. Lor a e F, consider 
the polynomial / := X p - X - a e L[X]. 

(a) Show that if F = Z p and a ^ 0. then / is irreducible. 

(b) More generally, show that if Tr f/i p (o) ± 0, then / is irreducible, and 
otherwise, / splits into distinct monic linear factors over F . 

Exercise 19.12. Let E be a finite extension of a finite field F. Show that every 
F-algebra automorphism on E must be a power of the Lrobenius map on E over F. 

Exercise 19.13. Show that for all primes p, the polynomial X 4 + 1 is reducible 
in Z P [X\. (Contrast this to the fact that this polynomial is irreducible in 0\X\, as 
discussed in Exercise 16.49.) 

Exercise 19.14. This exercise depends on the concepts and results in §18.6. Let 
E be an extension of degree l over a finite field F. Let a be the Lrobenius map on 
E over F. 

(a) Show that the minimal polynomial of a over F is X e — 1. 

(b) Show that there exists /? e E such that the minimal polynomial of fi under 
(j is - 1. 

(c) Conclude that /?, <r(/?) a e ~ l (P) form a basis for E over F. This type of 

basis is called a normal basis. 
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This chapter discusses efficient algorithms for factoring polynomials over finite 
fields, and related problems, such as testing if a given polynomial is irreducible, 
and generating an irreducible polynomial of given degree. 

Throughout this chapter, F denotes a finite field of characteristic p 
and cardinality q = p w . 

In addition to performing the usual arithmetic and comparison operations in F, 
we assume that our algorithms have access to the numbers p, w, and q, and have the 
ability to generate random elements of F. Generating such a random field element 
will count as one “operation in F,” along with the usual arithmetic operations. Of 
course, the “standard” ways of representing F as either Z p (if w = 1), or as the ring 
of polynomials modulo an irreducible polynomial over Z p of degree w (if rv > 1), 
satisfy the above requirements, and also allow for the implementation of arithmetic 
operations in F that take time 0(\cn(q) 2 ) on a RAM (using simple, quadratic-time 
arithmetic for polynomials and integers). 


20.1 Tests for and constructing irreducible polynomials 

Let / £ f[K] be a monic polynomial of degree l > 0. We develop here an efficient 
algorithm that determines if / is irreducible. 

The idea is a simple application of Theorem 19.10. That theorem says that for 
every integer k > 1, the polynomial X q — X is the product of all monic irreducibles 

whose degree divides k. Thus, gcd(A 9 -X, /) is the product of all the distinct linear 

2 

factors of /. If / has no linear factors, then gcd(X (/ - X , /) is the product of all the 
distinct quadratic irreducible factors of /. And so on. Now, if / is not irreducible, 
it must be divisible by some irreducible polynomial of degree at most l /2, and if g 
is an irreducible factor of / of minimal degree, say k , then we have k < £/ 2 and 
gcd(X qk — X, f) 1. Conversely, if / is irreducible, then gcdfX^ - X, f) = 1 for 
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all positive integers k up to £/2. So to test if / is irreducible, it suffices to check if 
gcd(X^ - X. f) = 1 for all positive integers k up to £ / 2 — if so, we may conclude 
that / is irreducible, and otherwise, we may conclude that / is not irreducible. 

k 

To carry out the computation efficiently, we note that if h = X q (mod /), then 
gcd(h-X,f) = gcd(X qk -X,f). 

The above observations suggest the following algorithm. 

Algorithm IPT. On input /, where / e F\ X \ is a monic polynomial of degree 
£ > 0, determine if / is irreducible as follows: 

h 4- X mod / 
for k <- I to [£/2\ do 
h <r- h q mod / 

if gcd(/i — X. f ) f I then return false 
return true 

The correctness of Algorithm IPT follows immediately from the above discus- 
sion. As for the running time, we have: 

Theorem 20.1. Algorithm IPT uses Od 3 len(g)) operations in F. 

Proof. Consider an execution of a single iteration of the main loop. The cost of 
the r/th-powcring step (using a standard repeated-squaring algorithm) is ()(\cn(q)) 
multiplications modulo /, and so 0{£ 2 len(< 7 )) operations in F. The cost of the 
ged computation is 0{£ 2 ) operations in F. Thus, the cost of a single loop iteration 
is ()(( 2 len(< 7 )) operations in F , from which it follows that the cost of the entire 
algorithm is 0(t 3 len(^r)) operations in F. □ 

Using a standard representation for F , each operation in F takes time 0(len(<jr) 2 ) 
on a RAM, and so the running time of Algorithm IPT on a RAM is 0(1 3 len(< 7 ) 3 ), 
which means that it is a polynomial-time algorithm. 

Let us now consider the related problem of constructing an irreducible poly- 
nomial of specified degree £ > 0. To do this, we can simply use the result of 
Theorem 19.12, which has the following probabilistic interpretation: if we choose 
a random, monic polynomial / of degree £ over F, then the probability that / is 
irreducible is at least 1 /2t. This suggests the following probabilistic algorithm: 

Algorithm RIP. On input £, where £ is a positive integer, generate a monic irre- 
ducible polynomial / e T[)(] of degree £ as follows: 
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repeat 

choose co, , Q_i € F at random 

set f^X ( + Xto C ‘ X ‘ 
test if / is irreducible using Algorithm IPT 
until / is irreducible 
output / 

Theorem 20.2. Algorithm RIP uses an expected number of OU 4 len(< 7 )) opera- 
tions in F, and its output is uniformly distributed over all monic iireducibles of 
degree l. 

Proof. This is a simple application of the generate-and-test paradigm (see Theo- 
rem 9.3, and Example 9. 10 in particular). Because of Theorem 19. 12, the expected 
number of loop iterations of the above algorithm is OU). Since Algorithm IPT 
uses OU' 1 ’ 1 e n ( < 7 ) ) operations in F, the statement about the running time of Algo- 
rithm RIP is immediate. The statement about its output distribution is clear. □ 

The expected running-time bound in Theorem 20.2 is actually a bit of an over- 
estimate. The reason is that if we generate a random polynomial of degree l, it 
is likely to have a small irreducible factor, which will be discovered very quickly 
by Algorithm IPT. In fact, it is known (see §20.7) that the expected value of 
the degree of the least degree irreducible factor of a random monic polynomial of 
degree l over F is 0(len(£)), from which it follows that the expected number of 
operations in F performed by Algorithm RIP is actually OU 4 lcn(() lcn(c/)). 

Exercise 20. 1. Let / e E[A] be a monic polynomial of degree £ > 0. Also, let 
£ := [X ] f e E, where E is the E-algebra E := F[X\/(f). 

(a) Given as input a e E and f 1 "' e E (for some integer m > 0), show how to 
compute the value a q " e E, using just OU 25 ) operations in F, and space 
for OU 22 ) elements of F. Hint: see Theorems 16.7 and 19.7, as well as 
Exercise 17.3. 

(b) Given as input f 1 "" e E and % q "' e E, where m and nt arc positive integers, 
show how to compute the value £ q "‘ + "‘ e E, using OU' 2 - 2 ) operations in F, 
and space for 0(£ 1-5 ) elements of F. 

(c) Given as input !; q e E and a positive integer m, show how to compute the 
value e E, using OU 2 2 len(m)) operations in F, and space for OU' 22 ) 
elements of F. Hint: use a repeated-squaring-like algorithm. 

Exercise 20.2. This exercise develops an alternative irreducibility test. 

(a) Show that a monic polynomial / e F\X\ of degree £ > 0 is irreducible if 
and only if X q ‘ = X (mod /) and gcd(X 9</s - X. f) = 1 for all primes s \ £. 
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(b) Using part (a) and the result of the previous exercise, show how to deter- 
mine if / is irreducible using ()(( 2 5 \cn(()a>(() + I: 2 lcn(r/)) operations in 
F, where co(£) is the number of distinct prime factors of l. 

(c) Show that the operation count in paid (b) can be reduced to 

0(£ 25 len(f) len(cw(f)) + I 2 len(^r)). 

Hint: see Exercise 3.39. 

Exercise 20.3. Design and analyze a deterministic algorithm that takes as input 
a list of irreducible polynomials € -F[X], where £ t := deg(/,) for 

i = 1, . . . , r, and assume that [£,\ r l=] is pairwise relatively prime. Your algorithm 
should output an irreducible polynomial / e F\ X ] of degree £ := n; =i U using 
0(£ 2 ) operations in F. Hint: use Exercise 19.5. 

Exercise 20.4. Design and analyze a probabilistic algorithm that, given a monic 
irreducible polynomial / e F\ X ] of degree £ as input, generates as output a random 
monic irreducible polynomial g e F\ X ] of degree £ (i.e., g should be uniformly 
distributed over all such polynomials), using an expected number of 0(£ 2 5 ) oper- 
ations in F. Hint: use Exercise 18.9 (or alternatively. Exercise 18.10). 

Exercise 20.5. Let / e E[Y] be a monic irreducible polynomial of degree £, let 
E := F[X]/(f), and let := [X)j e E. Design and analyze a deterministic algo- 
rithm that takes as input the polynomial / defining the extension E, and outputs 
the values 

S j:=Tr E / F (?)eF (j = 0, ...,£- 1), 

using Oil 2 ) operations in F . Here, Tr e/f is the trace from E to F (see §19.4). 
Show that given an arbitrary a e E, along with the values so, • • • , ty-i, one can 
compute Tr £/_ f(«) using just 0(1) operations in F. 


20.2 Computing minimal polynomials in F[X]/(f) (III) 

We consider, for the third and final time, the problem considered in §17.2 and 
§18.5: / e E[X | is a monic polynomial of degree £ > 0, and E := F[X]/(f) = 
F\E,\, where := [X] f, we arc given an element a e E, and want to compute the 
minimal polynomial (]> e F\X\ of a over F. We develop an alternative algorithm, 
based on the theory of finite fields. Unlike the algorithms in §17.2 and §18.5, this 
algorithm only works when F is finite and the polynomial / is irreducible, so that 
E is also a finite field. 

From Theorem 19.15, we know that the degree of a over F is the smallest pos- 

k 

itive integer k such that a q = a. By successive r/th powering, we can determine 
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k— 1 

the degree k and compute the conjugates a, a q , ... , a q of a, using 0(k len(<7)) 
operations in E, and hence 0(k£ 2 lcn(r/)) operations in F. 

Now, we could simply compute the minimal polynomial <p by directly using the 
formula 

k - 1 

cj>(Y) = Y[(Y-a qi ). (20.1) 

<=o 


This would involve computations with polynomials in the variable Y whose coef- 
ficients lie in the extension field E, although at the end of the computation, we 
would end up with a polynomial all of whose coefficients lie in F. The cost of this 
approach would be 0(k 2 ) operations in E, and hence 0(k 2 (: 2 ) operations in F. 

A more efficient approach is the following. Substituting £ for Y in the identity 
(20.1), we have 

k-l 

4>{g) = n« - « <?i )- 

1=0 


Using this formula, we can compute (given the conjugates of a) the value <p(£) e E 
using 0(k) operations in E, and hence 0(k£ 2 ) operations in F. Now, <p(£) is an 
element of E , and for computational puiposes, it is represented as [g] / for some 
polynomial g e F\X\ of degree less than i. Moreover, fi(£) = [</;]/, and hence 
4> = g (mod /). In particular, if k < £, then g = <p\ otherwise, if k = i, then 
g = <p — f . In either case, we can recover </; from g with an additional 0(1) 
operations in F. 

Thus, given the conjugates of a , we can compute cfi using 0(kl 2 ) operations in 
F. Adding in the cost of computing the conjugates, this gives rise to an algorithm 
that computes the minimal polynomial of a using 0(k£ 2 lcn(r/)) operations in F. 

In the worst case, then, this algorithm uses 0(1 3 1 e n ( <7 ) ) operations in F. A 
reasonably careful implementation needs space for storing a constant number of 
elements of E, and hence ()((.) elements of F. For very small values of q, the 
efficiency of this algorithm will be comparable to that of the algorithm in §18.5, 
but for large q, it will be much less efficient. Thus, this approach does not really 
yield a better algorithm, but it does serve to illustrate some of the ideas of the 
theory of finite fields. 


20.3 Factoring polynomials: square-free decomposition 

In the remaining sections of this chapter, we develop efficient algorithms for fac- 
toring polynomials over the finite field F. We begin in this section with a simple 
and efficient preprocessing step. Recall that a polynomial is called square-free if it 
is not divisible by the square of any polynomial of degree greater than zero. This 
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preprocessing algorithm takes the polynomial to be factored, and partially factors 
it into a product of square-free polynomials. Given this algorithm, we can focus 
our attention on the problem of factoring square-free polynomials. 

Let / g F\ X ] be a monic polynomial of degree i > 0. Suppose that / is not 
square-free. According to Theorem 19.1, d := gcd(/,D(/)) f 1, where D(/) is 
the formal derivative of / ; thus, we might hope to get a non-trivial factorization of 
/ by computing d. However, we have to consider the possibility that d = f. Can 
this happen? The answer is “yes,” but if it does happen that d = /, we can still get 
a non-trivial factorization of / by other means: 

Theorem 20.3. Suppose that f 6 F\ X \ is a monic polynomial of degree l > 0, 
and that gcd(/,D(/)) = f. Then f = g{X p ) for some g 6 F[X]. Moreover, if 
g = OjX 1 , then f = h p , where 

w i n O-l) 

h = 2_ j a 1 ’ X'. (20.2) 

i 

Proof. Since deg(D(/)) < deg(/) and gcd (/, D(/)) = /, we must have D(/) = 0. 
If / = YjiCiX', then D(/) = JV/CjX' -1 . Since this derivative must be zero, it 
follows that all the coefficients c, with i f 0 (mod p) must be zero to begin with. 
That proves that / = g{X p ) for some g e F[X ]. Furthermore, if h is defined as 
above, then 

V = (z = 2 of*" = z ■>,<*')' = s(xn = /. □ 

i i i 

Our goal now is to design an efficient algorithm that takes as input a monic poly- 
nomial / g -F[X] of degree £ > 0, and outputs a list of pairs ((gi, si), . . . , (g t , s t )), 
where 

• each gj g F\X\ is monic, non-constant, and square-free, 

• each Sj is a positive integer, 

• the family of polynomials {g, }' =| is pairwise relatively prime, and 

• / = nu 8t‘- 

We call such a list a square-free decomposition of /. There arc a number of ways 
to do this. The algorithm we present is based on the following theorem, which 
itself is a simple consequence of Theorem 20.3. 

Theorem 20.4. Let f 6 F\X\ be a monic polynomial of degree l > 0. Suppose 
that the factorization of f into irreducibles is f = f* 1 ■ ■ ■ ff . Then 

L = rt f 

gcd(/,D(/)) 11 

e,^0 (mod p) 
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Proof. The theorem can be restated in terms of the following claim: for each 
i = 1, . . . , r, we have 

• IT I D(/) if e, = 0 (mod p), and 

• /T 1 I D (/) but f? t D (/) if e ‘ £ 0 ( mod P)- 

To prove the claim, we take formal derivatives using the usual rule for products, 
obtaining 

D(/) = 2 ejtf-'lKfj) J] f k k . (20.3) 

j k^j 

Consider a fixed index Clearly, f e f divides every term in the sum on the right- 
hand side of (20.3), with the possible exception of the term with j = In the case 
where e, = 0 (mod p). the term with j = i vanishes, and that proves the claim in 
this case. So assume that e, f 0 (mod p). By the previous theorem, and the fact 
that ft is irreducible, and in particular, not the pth power of any polynomial, we 
see that D (/,•) is non-zero, and (of course) has degree strictly less than that of /,. 
From this, and (again) the fact that /, is irreducible, it follows that the term with 
j = i is divisible by ff , but not by /?', from which the claim follows. □ 

This theorem provides the justification for the following square -free decompo- 
sition algorithm. 

Algorithm SFD. On input /, where / e F\X\ is a monic polynomial of degree 
l > 0, compute a square-free decomposition of / as follows: 

initialize an empty list L 

s 1 

repeat 

j+- 1, 8 f / gcd(/, D(/)) 

while g 1 do 

/ <- f/g, h <- ged (f,g), m <- g/h 
if m 1 then append ( m,js ) to L 
g h, j ^ j + 1 

i f / I then // / is a pth power 

//compute a pth root as in (20.2) 
f <- fW, s <- ps 

until / = 1 
output L 

Theorem 20.5. Algorithm SFD correctly computes a square-free decomposition 
of f using Off 2 + £(w - 1) lcn(p)/p) operations in F. 

Proof. Let / = ]^ ; f e f be the factorization of the input / into irreducibles. Let S 
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be the set of indices i such that e, ^ 0 (mod p), and let S' be the set of indices 
i such that e, = 0 (mod p). Also, for j > 1, let S>j := {/ G S : e-, > j) and 
S =j := {i e S : ej = j) . 

Consider the first iteration of the main loop. By Theorem 20.4, the value first 
assigned to g is ]~[ (6 s /,. It is straightforward to prove by induction on j that at 
the beginning of the y'th iteration of the inner while loop, the value assigned to g is 
ILs s>j ft ’ and the value assigned to / is Yl ieS ^fi‘~ J+l ' li es !?■ Moreover, in 
the j th loop iteration, the value assigned to m is f7 (g S /, . It follows that when the 

while loop terminates, the value assigned to / is , and the value assigned 

to I is a square-free decomposition of ]~[ (s5 /,* ' ; if / does not equal 1 at this 
point, then subsequent iterations of the main loop will append to I a square-free 
decomposition of IWf- 

That proves the correctness of the algorithm. Now consider its running time. 
Again, consider just the first iteration of the main loop. The cost of computing 
//gcd(/,D(/)) is at most C\t 2 operations in F, for some constant C\. Now 
consider the cost of the inner while loop. It is not hard to see that the cost of the 
y th iteration of the inner while loop is at most 

CoJ J] deg(/,) 

ieS>j 

operations in F, for some constant C 2 . This follows from the observation in the 
previous paragraph that the value assigned to g is ILeSy ft, along with our usual 
cost estimates for division and Euclid’s algorithm. Therefore, the total cost of all 
iterations of the inner while loop is at most 

c 2 i 2 2 

j> 1 

operations in F . In this double summation, for each i e A, the term deg (/,-) is 
counted exactly e, times, and so we can write this cost estimate as 

C 2 C £ e, deg(/,) < C 2 i 2 . 

ieS 

Finally, it is easy to see that in the if-then statement at the end of the main loop 
body, if the algorithm does in fact compute a pth root, then this takes at most 

C 3 l{w - l)len (p)/p 

operations in F, for some constant C 3 . Thus, we have shown that the total cost of 
the first iteration of the main loop is at most 

(C[ + C 2 )f + C 3 t(w - 1) len(p)/p 
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operations in F. If the main loop is executed a second time, the degree of / at the 
start of the second iteration is at most l/p, and hence the cost of the second loop 
iteration is at most 

(Ci + C 2 ){£/p) 2 + C 3 {£/p){w - 1) len (p)/p 

operations in F. More generally, for t = 1,2,..., the cost of loop iteration t is at 
most 

(Ci + C 2 )(f/p ? - 1 ) 2 + C 3 (t /p'- l ){w - 1) len (p)/p, 
operations in F, and summing over all f > 1 yields the stated bound. □ 

20.4 Factoring polynomials: the Cantor-Zassenhaus algorithm 

In this section, we present an algorithm due to Cantor and Zassenhaus for factoring 
a given polynomial over the finite field F into irreducibles. We shall assume that 
the input polynomial is square- free, using Algorithm SFD in §20.3 as a preprocess- 
ing step, if necessary. The algorithm has two stages: 

Distinct Degree Factorization: The input polynomial is decomposed into factors 
so that each factor is a product of distinct irreducibles of the same degree 
(and the degree of those irreducibles is also determined). 

Equal Degree Factorization: Each of the factors produced in the distinct degree 
factorization stage are further factored into their irreducible factors. 

The algorithm we present for distinct degree factorization is a deterministic, 
polynomial-time algorithm. The algorithm we present for equal degree factoriza- 
tion is a probabilistic algorithm that runs in expected polynomial time (and whose 
output is always correct). 


20.4.1 Distinct degree factorization 

The problem, more precisely stated, is this: given a monic, square-free polynomial 
/ e .F[X] of degree i > 0, produce a list of pairs ((gi, k\), . . . , (g ? , k t )) where 

• each g, is the product of monic irreducible polynomials of degree kj, and 

• / = n:=i go 

This problem can be easily solved using Theorem 19. 10, using a simple variation 
of the algorithm we discussed in §20.1 for irreducibility testing. The basic idea is 
this. We can compute g := gcdfX 17 - X.f), so that g is the product of all the 

linear factors of /. After removing all linear factors from /, we next compute 

2 

gcd(X q —X.f), which will be the product of all the quadratic irreducibles dividing 

2 

/, and we can remove these from / — although X q — X is the product of all linear 
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and quadratic irreducibles, since we have already removed the linear factors from 
/, the gcd will give us just the quadratic factors of /. In general, for k = 1, . . . , l, 
having removed all the irreducible factors of degree less than k from /, we compute 

k 

gcd(X q - X. f) to obtain the product of all the irreducible factors of / of degree 
k, and then remove these from /. 

The above discussion leads to the following algorithm for distinct degree factor- 
ization. 

Algorithm DDF. On input /, where / e F\X\ is a monic square-free polynomial 
of degree £ > 0, do the following: 

initialize an empty list L 
h <— X mod / 
k <- 0 

while f 1 do 

h <r- h q mod f,k*-k+ 1 
g^gcd(h-X,f) 
if g 1 then 

append (g, k) to L 

f^f/g 

h <r- h mod / 

output L 

The correctness of Algorithm DDF follows from the discussion above. As for 
the running time: 

Theorem 20.6. Algorithm DDF uses 0(£ 3 len(< 7 )) operations in F. 

Proof. Note that the body of the main loop is executed at most £ times, since after 
i iterations, we will have removed all the factors of /. Thus, we perform at most 
l gth-powering steps, each of which takes ()(( 2 1 e n ( ) j operations in F, and so the 
total contribution to the running time of these is 0(£ 3 len(< 7 )) operations in F. We 
also have to take into account the cost of the gcd and division computations. The 
cost per loop iteration of these is 0(£ 2 ) operations in F , contributing a term of 
0(£ 3 ) to the total operation count. This term is dominated by the cost of the <yth- 
powering steps, and so the total cost of Algorithm DDF is 0(l 3 lcn(c/)j operations 
in F. □ 
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20.4.2 Equal degree factorization 

The problem, more precisely stated, is this: given a monic polynomial / e E[X] 
of degree t > 0, and an integer k > 0, such that / is of the form 

/ = /.■■■/, 

for distinct monic irreducible polynomials fi,...,f r , each of degree k, compute 
these irreducible factors of /. Note that given / and k, the value of r is easily 
determined, since r = i/k. 

We begin by discussing the basic mathematical ideas that will allow us to effi- 
ciently split / into two non-trivial factors, and then we present a somewhat more 
elaborate algorithm that completely factors f. 

By the Chinese remainder theorem, we have an E-algebra isomorphism 

0 : E -> E\ x • • • x E r 

[g]f ^ ([g]/ 1 ? ---[,g]/,), 

where E is the E-algebra F[X]/(f), and for i = 1 ,... ,r, E, is the extension field 
F\X\/( //) of degree k over F. 

Recall that q = p"'. We have to treat the cases p = 2 and p > 2 separately. We 
first treat the case p = 2. Let us define the polynomial 

wk—l 

M k := Yj e (20.4) 

j = o 

(The algorithm in the case p > 2 will only differ in the definition of M k .) 

For a e E, if 6(a) = (a\,..., a r ), then we have 

0{M k {a)) = M k (0(a)) = (M k ( ai ), ..., M k {a r )). 

Note that each E, is an extension of Z 2 of degree wk, and that 

wk — 1 

M k {Ui) = Y a f = Tj '£,-/Z 2 («i)’ 
j = 0 

where Tr^/j j2 : E, -> Z 2 is the trace from E, to Z 2 , which is a surjective, Z 2 -linear 
map (see §19.4). 

Now, suppose we choose a e E at random. Then if 8(a) = (a\ a r ), the fam- 

ily of random variables { a , } , is mutually independent, with each a, uniformly 
distributed over E,. It follows that the family of random variables { /Vf/fin,) }' =| is 
mutually independent, with each M k (a t ) uniformly distributed over Z 2 . Thus, if 
g := rep(M,t(a)) (i.e., g e E[X] is the polynomial of degree less than t such that 
M k (a) = [g]/), then gcd(g, /) will be the product of those factors /, of / such 
that M k (aj) = 0. We will fail to get a non-trivial factorization only if the M k ( or,- ) 
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arc either all 0 or all 1, which for r >2 happens with probability at most 1/2 (the 
worst case being when r = 2). 

That is our basic splitting strategy. The algorithm for completely factoring / 
works as follows. The algorithm proceeds in stages. At any stage, we have a partial 
factorization / = \\ heH h, where // is a set of non-constant, monic polynomials. 
Initially, H = {/}. With each stage, we attempt to get a finer factorization of / 
by frying to split each h e H using the above splitting strategy — if we succeed in 
splitting h into two non-trivial factors, then we replace h by these two factors. We 
continue in this way until \H\ = r. 

Here is the full equal degree factorization algorithm. 

Algorithm EDF. On input /, k, where / e F\ X ] is a monic polynomial of degree 
l > 0, and k is a positive integer, such that / is the product of r := l/k distinct 
monic irreducible polynomials, each of degree k, do the following, with M/ ; as 
defined in (20.4): 

H<- {/} 
while \H\ < r do 
H' <- 0 

for each hell do 

choose a e F[X]/(h ) at random 
d <- gcdfrep (M k (a)),h) 
if d = 1 or d = h 

then H' «- H' U {h} 
else H' <- H'U {d,h/d} 

H <- H' 
output H 

The correctness of the algorithm is clear from the above discussion. As for its 
expected running time, we can get a quick-and-dirty upper bound as follows: 

• For a given h and a e F[X]/(h ), the value M*(a) can be computed using 
0(k deg (h) 2 lcn(c/)) operations in F, and so the number of operations in F 
performed in each iteration of the main loop is at most a constant times 

k lenfg) ^ deg(/r) 2 < k len(< 7 )^ ^ deg (h)^j = kl 2 len(g). 
h&H heH 

• The expected number of iterations of the main loop until we get some non- 
trivial split is 0(1). 

• The algorithm finishes after getting r — I non-trivial splits. 
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• Therefore, the total expected cost is 0(rkl 2 lcn(r/)), or 0(1 3 lcn(t/j), oper- 
ations in F. 

This analysis gives a bit of an over-estimate — it does not take into account the 
fact that we expect to get fairly “balanced” splits. For the purposes of analyzing 
the overall running time of the Cantor-Zassenhaus algorithm, this bound suffices; 
however, the following analysis gives a tight bound on the complexity of Algo- 
rithm EDF. 

Theorem 20.7. In the case p = 2, Algorithm EDF uses an expected number of 
0(kl 2 len(<7)) operations in F. 

Proof. We may assume r > 2. Let L be the random variable that represents the 
number of iterations of the main loop of the algorithm. For n > 1, let H„ be 
the random variable that represents the value of H at the beginning of the nth 
loop iteration. For i,j = 1, . . . , r, we define L,j to be the largest value of n (with 
1 < n < L) such that /, | h and fj \ h for some h e H n . 

We first claim that E[L] = O(lenfr)). To prove this claim, we make use of the 
fact (see Theorem 8.17) that 

E [L] = P[ *- > «]• 

ri> 1 

Now, L > /i if and only if for some i, j with 1 < i < j < r, we have Ly > n. 
Moreover, if /, and fj have not been separated at the beginning of one loop itera- 
tion, then they will be separated at the beginning of the next with probability 1/2. 
It follows that 

P [Ltj >n\= 2 _( "“ 1> . 

So we have 

P [L > n] < J] P [L u >n]< r 2 2~ n . 

i<j 

Therefore, 

E [L] = £ p [L>n]= J] p [ L > «] + E P[L ^ " ] 

n> 1 «<21og 2 /* n> 21og 2 r 

< 2 log 2 r+ J] r2l ~ n ^ 2 lo §2 ' + E 2_ " = 2 log 2 r + 2 ' 

n > 2 log 2 r n > 0 

which proves the claim. 

As discussed in the paragraph above this theorem, the cost of each iteration of 
the main loop is 0(kl 2 lcn(c/)) operations in F. Combining this with the fact that 
E[L] = 0(len(r)), it follows that the expected number of operations in F for the 
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entire algorithm is 0(lcn(r)/<( 2 lcn(r/j). This is significantly better than the above 
quick-and-dirty estimate, but is not quite the result we arc after. For this, we have 
to work a little harder. 

For each polynomial h dividing /, define o>( h) to be the number of irreducible 
factors of h. Let us also define the random variable 

S := E E 

n= 1 heH n 

It is easy to see that the total number of operations performed by the algorithm is 
0(Sk 3 len(< 7 )), and so it will suffice to show that E[S] = 0{r 2 ). 

We claim that 

• J 

where the sum is over all i,j = 1, . . . , r. To see this, define d, 7 (/i) to be 1 if both /,- 
and fj divide h, and 0 otherwise. Then we have 

$ = E E E w = E E E s vw = E ^ 

n heH„ i.j i.j n heH„ ij 

which proves the claim. 

We can write 

5 = E Lij + ^ La = ^ Ljj + rL. 

¥j i ¥j 

For / ^ j, we have 

E [L i j] = J j P[L i j>n\ = ^2-^ = 2, 

n> 1 i> 1 

and so 

E[S] = J] E[L y ] + r E[L] = 2 r(r - 1) + 0(rlen(r)) = 0(r 2 ). 

¥i 

That proves the theorem. □ 

That completes the discussion of Algorithm EDF in the case p = 2. Now assume 
that p > 2, so that p, and hence also q, is odd. Algorithm EDF in this case is exactly 
the same as above, except that in this case, we define the polynomial as 

M k := X (qk ~ ])/2 - 1 e F[X], (20.5) 

Just as before, for a e E with 9{a) = (oq, . . . , a r ), we have 


0(M k (a)) = M k (9(a)) = (M k (ai), M k {a r )). 
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Note that each group E* is a cyclic group of order q k — 1 , and therefore, the image 
of the (q k - l)/2-power map on E* is {±1}. 

Now, suppose we choose a e E at random. Then if 8(a) = (cq a r ), the fam- 

ily of random variables {a,}' =] is mutually independent, with each a, uniformly 
distributed over E,. It follows that the family of random variables { } '.’ =| 

is mutually independent. If a, = 0, which happens with probability I / q k , then 
M/fiafi = -1; otherwise, a\ q is uniformly distributed over {±1}, and so 
Mk(af) is uniformly distributed over {0,-2}. That is to say, 

{ 0 with probability (q k - 1) /2q k , 

-1 with probability 1 /q k , 

-2 with probability (q k - 1) /2 q k . 


Thus, if g := rep (Mk(a)), then gcd(g, /) will be the product of those factors /, of 
/ such that M/fiafi = 0. We will fail to get a non-trivial factorization only if the 
are either all zero or all non-zero. Assume r > 2. Consider the worst case, 
namely, when r = 2. In this case, a simple calculation shows that the probability 
that we fail to split these two factors is 


( 


q k -W ~ t q k + l 
2q k / V 2 q k 



The (very) worst case is when q k = 3, in which case the probability of failure is at 
most 5 /9. 

The same quick- and-dirty analysis given just above Theorem 20.7 applies here 
as well, but just as before, we can do better: 


Theorem 20.8. In the case p > 2, Algorithm EDF uses an expected number of 
0(kl 2 len(< 7 )) operations in F . 


Proof. The analysis is essentially the same as in the case p = 2, except that now 
the probability that we fail to split a given pair of irreducible factors is at most 5/9, 
rather than equal to 1 /2. The details arc left as an exercise for the reader. □ 


20.4.3 Analysis of the whole algorithm 

Given an arbitrary monic square-free polynomial / e F\X\ of degree l > 0, the 
distinct degree factorization step takes 0(1 3 lcn(</)) operations in F. This step 
produces a number of polynomials that must be further subjected to equal degree 
factorization. If there arc t such polynomials, where the /th polynomial has degree 
tj, for i = 1 ,...,t, then X/=i = Now, the equal degree factorization step 
for the /th polynomial takes an expected number of 0(1 1 \cn(q)) operations in F 
(actually, our initial, “quick and dirty” estimate is good enough here), and so it 
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follows that the total expected cost of all the equal degree factorization steps is 
0(2i t) lenfr/)), which is ()(P lcnfr/)), operations in F. Putting this all together, 
we conclude: 

Theorem 20.9. The Cantor-Zassenhaus factoring algorithm uses an expected 
number of ()((' lenfg)) operations in F. 

This bound is tight, since in the worst case, when the input is irreducible, the 
algorithm really does do this much work. Also, we have assumed the input to 
the Cantor-Zassenhaus is a square-free polynomial. However, we may use Algo- 
rithm SFD as a preprocessing step to ensure that this is the case. Even if we include 
the cost of this preprocessing step, the running time estimate in Theorem 20.9 
remains valid. 


Exercise 20.6. Show how to modify Algorithm DDF so that the main loop halts 
as soon as 2k > deg(/). 

Exercise 20.7. Suppose that in Algorithm EDF, we replace the two lines 

for each hell do 

choose a e F[X]/(h ) at random 

by the following: 

choose ao , . . . , a 2 k-i e F at random 
5 - Ijio 1 ajX j e F[X] 
for each hell do 

a <- [g] h e F[X]/(h ) 

Show that the expected running time bound of Theorem 20.6 still holds (you may 
assume p = 2 for simplicity). 

Exercise 20.8. This exercise extends the techniques developed in Exercise 20.1. 
Let / e E[X] be a monic polynomial of degree £ > 0, and let | := [X]/ e E, 
where E := F\X\/(f). For each integer m > 0, define polynomials 

T m := X + X q + • • • + X q "'~' e F[X] and N m := X ■ X“ X q ""' e F[X]. 

(a) Given as input 2 q '" 6 E and I 9 " e E, where m and m! arc positive integers, 

along with T m (a ) and for some a e E, show how to compute the 

values c q '“ and T m+m ’(a), using Off 25 ) operations in F, and space for 
Off! 1 - 5 ) elements of F. 

(b) Given as input % q e E, a e E, and a positive integer m, show how to 
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compute (using part (a)) the value T m (a), using Od 25 len(m)) operations 
in E, and space for 0(£ ] 5 ) elements of F. 

(c) Repeat parts (a) and (b), except with “JV” in place of “E.” 

Exercise 20.9. Using the result of the previous exercise, show how to implement 
Algorithm EDF so that it uses an expected number of 

0(len(k)£ 2 ' 5 + 1 2 len (<?)) 

operations in F, and space for 0(£ 15 ) elements of F. 

Exercise 20. 10. This exercise depends on the concepts and results in §18.6. Let 
E be an extension field of degree £ over F, specified by an irreducible polynomial 
of degree l over F. Design and analyze an efficient probabilistic algorithm that 
finds a normal basis for E over F (see Exercise 19.14). Hint: there are a number 
of approaches to solving this problem; one way is to start by factoring X' — I 
over E, and then turn the construction in Theorem 18.12 into an efficient proba- 
bilistic procedure; if you mimic Exercise 11.2, your entire algorithm should use 
0(1 3 lcn(C) len(< 7 )) operations in F (or 0(lcn(r)( 3 1 e n ( ) ) operations, where r is 
the number of distinct irreducible factors of X 1 — 1 over F ). 


20.5 Factoring polynomials: Berlekamp’s algorithm 

We now develop an alternative algorithm, due to Berlekamp, for factoring a poly- 
nomial over the finite field F into irreducibles. We shall assume that the input 
polynomial is square-free, using Algorithm SFD in §20.3 as a preprocessing step, 
if necessary. 

Let us now assume we have a monic square-free polynomial / e E[X] of degree 
£ > 0 that we want to factor into irreducibles. We first present the mathematical 
ideas underpinning the algorithm. 

Let E be the E-algebra F\X\/(f). Let a be the Frobenius map on E over E, 
which maps a e E to a q e E. We know that a is an E-algebra homomorphism (see 
Theorem 19.7). Consider the subalgebra B of E fixed by a (see Theorem 16.6). 
Thus, 

B = [a e E : a 9 = a] . 

The subalgebra B is called the Berlekamp subalgebra of E. Let us take a closer 
look at it. Suppose that / factors into irreducibles as 

/ = /]•■■ //•! 
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and let 

8 : E -> Ei x ■ ■ ■ x E r 

lg\f ^ ([g]/!,-..,^]/,) 

be the F-algcbra isomorphism from the Chinese remainder theorem, where £, := 
F\X\/( fj) is an extension field of F of finite degree for i = 1, . . . ,r. Now, for 
a £ E, if 8(a) = (a i, . . . , a r ), then we have a q = a if and only if a ( 9 = a, for 
i = 1, . . . , r; moreover, by Theorem 19.8, we know that for all a, £ E h we have 
a c ‘ = a, if and only if a, e F. Thus, we may characterize B as follows: 

B = {8~ [ (c\, . . . ,c r ) : c\, . . . ,c r £ F} . 

Since B is a subalgebra of E, then as F - vector spaces, B is a subspace of E. 
Of course, E has dimension £ over F, with the natural basis }* =) , where 
£ := [X] f. As for the Berlekamp subalgebra, from the above characterization of B. 
it is evident that the elements 

0- 1 (l,O,...,O), 0- 1 (O,l,O,...,O), ..., 0 _1 (O, . . . ,0, 1) 

form a basis for B over F, and hence, B has dimension r over F. 

Now we come to the actual factoring algorithm. 


Stage 1: Construct a basis for B 

The first stage of Berlekamp’s factoring algorithm constructs a basis for B over F. 
We can easily do this using Gaussian elimination, as follows. Let p : E -> E be 
the map that sends a £ E to a( a) — a = a q - a. Since a is an .F-linear map, the 
map p is also /•'-linear. Moreover, the kernel of p is none other than the Berlekamp 
subalgebra B. So to find a basis for B. we simply need to find a basis for the kernel 
of p using Gaussian elimination over F, as in §14.4. 

To perform the Gaussian elimination, we need to choose a basis S for E over 
F, and construct the matrix Q := Mats'.s(p) £ F lxt , that is, the matrix of p with 
respect to this basis, as in §14.2, so that evaluation of p corresponds to multiplying 
a row vector on the right by Q. We are free to choose a basis in any convenient 
way, and the most convenient basis, of course, is S := {<f -1 } l i=v since for compu- 
tational purposes, we already represent an element a £ E by its coordinate vector 
Vecsfa). The matrix Q , then, is the i x £ matrix whose /'th row, for / = is 

Vec 5 ( pOf -1 )). Note that if a = then = 

a' -1 — c'^ 1 . This observation allows us to construct the rows of Q by first computing 
via repeated squaring, and then just computing successive powers of c q . 

After we construct the matrix Q, we apply Gaussian elimination to get row vec- 
tors vi , . . . , v r that form a basis for the row null space of Q. It is at this point that 



540 


Algorithms for finite fields 


our algorithm actually discovers the number r of irreducible factors of /. Our basis 

for B is {Pi} r i=v where Vec s(Pi) = v,- for i = 1, r. 

Putting this all together, we have the following algorithm to compute a basis for 
the Berlekamp subalgebra. 

Algorithm Bl. On input /, where f e F\X\ is a monic square-free polynomial 
of degree f > 0, do the following, where E := F[X\/(f), £ := [X]/ e E, and 

S-={t l } l l= v 

let Q be an f x f matrix over F (initially with undefined entries) 

compute a <- using repeated squaring 

P^Ie 

for / *- 1 to f do //invariant: P = a'~ l = (c' _l ) q 

Row,(0) <r- Vec s(P), Q(i, 0 Q(i,i) - 1 , P ^ Pa 
compute a basis { v,- }' =1 of the row null space of Q using 
Gaussian elimination 
for / = \ r do Pj <- Vec^. 1 (v, ) 
output {pi } r . = i 

The correctness of Algorithm B 1 is clear from the above discussion. As for the 
running time: 

Theorem 20.10. Algorithm Bl uses Off 3 + t 2 len(g)) operations in F . 

Proof. This is just a matter of counting. The computation of a takes Oflenfg)) 
operations in E using repeated squaring, and hence Off 2 lenfr/)) operations in F . 
To build the matrix Q, we have to perform an additional Off) operations in E to 
compute the successive powers of a, which translates into Off 3 ) operations in F. 
Finally, the cost of Gaussian elimination is an additional Off 3 ) operations in F. □ 


Stage 2: Splitting with a basis for B 

The second stage of Berlekamp ’s factoring algorithm is a probabilistic procedure 
that factors / using a basis {piY i=l for B. As we did with Algorithm EDF in 
§20.4.2, we begin by discussing how to efficiently split / into two non-trivial fac- 
tors, and then we present a somewhat more elaborate algorithm that completely 
factors /. 

Let M\ e LfX] be the polynomial defined by (20.4) and (20.5); that is, 


Mi := 


{ 


Z w— 1 y !) 

7=0 A 

X (( i~ D/ 2 - 1 


if p = 2, 
if p > 2. 


Using our basis for B, we can easily generate a random element p of B by simply 
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choosing c\, . . . , c r at random, and computing ft := Cjft If 0(ft) = (hi, b r ), 

then the family of random variables { bj } ] is mutually independent, with each b, 
uniformly distributed over F. Just as in Algorithm EDF, gcd(rcp( M \ (ft)), /) will 
be a non-trivial factor of / with probability at least 1 /2, if p = 2, and probability 
at least 4/9, if p > 2. 

That is the basic splitting strategy. We turn this into an algorithm to completely 
factor / using the same technique of iterative refinement that was used in Algo- 
rithm EDF. That is, at any stage of the algorithm, we have a partial factorization 
/ = \\ he n h, which we try to refine by attempting to split each h e FI using 
the strategy outlined above. One technical difficulty is that to split such a poly- 
nomial h, we need to efficiently generate a random element of the Berlekamp 
subalgebra of F[X]/(h). A particularly efficient way to do this is to use our 
basis for the Berlekamp subalgebra of F\X\/(f) to generate a random element 
of the Berlekamp subalgebra of F[X]/(h) for all h e FI simultaneously. Let 
gi := rcp ( ft ft) for i = 1 If we choose ci,...,c r e F at random, and set 

g := cigi + ■ ■ • + c r gr , then [g] / is a random element of the Berlekamp subalgebra 
of F\X\/(f), and by the Chinese remainder theorem, it follows that the family 
of random variables {[g]h] heH is mutually independent, with each [g]/, uniformly 
distributed over the Berlekamp subalgebra of F[X]/(h). 

Here is the algorithm for completely factoring a polynomial, given a basis for 
the corresponding Berlekamp subalgebra. 

Algorithm B2. On input /, {fti} r j=v where / e F[ is a monic square-free poly- 
nomial of degree t > 0. and {fti}' i=l is a basis for the Berlekamp subalgebra of 
F[X]/(f), do the following, where g; := rep(/?/) for i = 1, . . . , r: 

H «- {/} 

while \FI\ < r do 

choose ci, . . . , c r e F at random 
g <- Cjgi + • • • + C r gr £ F[X] 

H' +- 0 

for each hell do 

ft <- [g]h e F[X]/(h) 
d <r- gcd(rep(Mi (ft)), h) 
if d = 1 or d = h 

then H' <- H' U {h} 
else H' H'U {d,h/d} 

H <r- H' 
output FI 
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The correctness of the algorithm is clear. As for its expected running time, we 
can get a quick- and-dirty upper bound as follows: 

• The cost of generating g in each loop iteration is O(rl) operations in F. 
For a given /?, the cost of computing /? := [g]/, e F[X]/(h) is 0{£ deg (/?)) 
operations in F , and the cost of computing M\{ ft) is 0(dcg(/i) 2 len(g)) 
operations in F . Therefore, the number of operations in F performed in 
each iteration of the main loop is at most a constant times 

r£ + i y deg(/ 2 ) + len(g) ^ deg (h) 2 
heH heH 

< 2£ 2 + len(r/)f ^ deg (h)J = 0(£ 2 \en{q)). 
heH 

• The expected number of iterations of the main loop until we get some non- 
trivial split is 0(1). 

• The algorithm finishes after getting r — 1 non-trivial splits. 

• Therefore, the total expected cost is 0(r£ 2 lcn(c/)j operations in F . 

A more careful analysis reveals: 

Theorem 20.11. Algorithm B2 uses an expected number of 

0(len(r)f 2 len(< 7 )) 

operations in F . 

Proof. The proof follows the same line of reasoning as the analysis of Algo- 
rithm EDF. Indeed, using the same argument as was used there, the expected 
number of iterations of the main loop is 0(len(r)). As discussed in the paragraph 
above this theorem, the cost per loop iteration is ()(( 2 1 e n ( ) ) operations in F . The 
theorem follows. □ 

The bound in the above theorem is tight (see Exercise 20.11 below): unlike 
Algorithm EDF, we cannot make the multiplicative factor of len(r) go away. 

Putting together Algorithms B1 and B2, we get Berlekamp’s complete factoring 
algorithm. The running time bound is easily estimated from the results already 
proved: 

Theorem 20.12. Berlekamp’s factoring algorithm uses an expected number of 
0(£ 3 + £ 2 len(f ) len(^)) operations in F. 

We have assumed the input to Berlekamp's algorithm is a square-free polyno- 
mial. However, we may use Algorithm SFD as a preprocessing step to ensure that 
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this is the case. Even if we include the cost of this preprocessing step, the running 
time estimate in Theorem 20.12 remains valid. 

So we see that Berlekamp’s algorithm is faster than the Cantor-Zassenhaus algo- 
rithm, whose expected operation count is Off 3 lenfg)). The speed advantage of 
Berlekamp’s algorithm grows as q gets large. The one disadvantage of Berlekamp’s 
algorithm is space: it requires space for Off 2 ) elements of F, while the Cantor- 
Zassenhaus algorithm requires space for only 0(1.) elements of F. One can in fact 
implement the Cantor-Zassenhaus algorithm so that it uses Off 3 + 1 2 lenfg)) oper- 
ations in F, while using space for only Off L5 ) elements of F — see Exercise 20. 13 
below. 


Exercise 20.1 1. Give an example of a family of input polynomials that cause 
Algorithm B2 to use an expected number of at least Off 2 lenff) lcnfc/)) operations 
in F. Assume that computing M\(fi) for /? e F[X]/(h ) takes Qfdegf/i) 2 lenfg)) 
operations in F. 


Exercise 20. 12. Using the ideas behind Berlekamp’s factoring algorithm, devise 
a deterministic irreducibility test that, given a monic polynomial of degree £ over 
F , uses 0 (£ 3 + £ 2 lenfg)) operations in F. 


Exercise 20.13. This exercise develops a valiant of the Cantor-Zassenhaus 
algorithm that uses 0(1 3 + £ 2 Icnfc/)) operations in F, while using space for only 
Off 1 ' 5 ) elements of F. By making use the valiant of Algorithm EDF discussed 
in Exercise 20.9, our problem is reduced to that of implementing Algorithm DDF 
within the stated time and space bounds, assuming that the input polynomial is 
square-free. 


(a) Show that for all non-negative integers /, j, with i ± j, the irreducible poly- 
nomials in E[Z] that divide X q ‘ — X qJ are precisely those whose degree 
divides i — j. 

(b) Let f e E[Z] be a monic polynomial of degree £ > 0, and let m = Off 1 / 2 ). 
Let := [X)f e E, where E := F[X]/(f). Show how to compute 


" e E and ? 


e E 


using 0 (£ 3 + £ 2 len(g)) operations in F, and space for Off 15 ) elements of 

F. 


(c) Combine the results of parts (a) and (b) to implement Algorithm DDF on 
square-free inputs of degree l, so that it uses Off 3 + f 2 len(g)) operations 
in F, and space for Off 15 ) elements of F. 
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20.6 Deterministic factorization algorithms (*) 

The algorithms of Cantor and Zassenhaus and of Berlekamp are probabilistic. The 
exercises below develop a deterministic variant of the Cantor-Zassenhaus algo- 
rithm. (One can also develop deterministic valiants of Berlekamp’s algorithm, 
with similar complexity.) 

This algorithm is only practical for finite fields of small characteristic, and is 
anyway mainly of theoretical interest, since from a practical perspective, there is 
nothing wrong with the above probabilistic method. In all of these exercises, we 
assume that we have access to a basis for F as a vector space over 7L P . 

To make the Cantor-Zassenhaus algorithm deterministic, we only need to 
develop a deterministic valiant of Algorithm EDF, as Algorithm DDF is already 
deterministic. 

Exercise 20. 14. Let / = f\ - ■ ■ f r , where the s are distinct monic irreducible 
polynomials in F[X]. Assume that r > 1, and let £ := deg(/). For this exercise, 
the degrees of the /,■’ s need not be the same. For an intermediate field F', with 
h p C F' C F, let us call a set A = {2i,...,2 s }, where each A u e F[X] with 
dcg( A„) < £, a separating set for / over F' if the following conditions hold: 

• for / = \, ... ,r and u = 1 , . . . , s, there exists c ui e F' such that A u = 
c ut (mod /,), and 

• for every pair of distinct indices i,j, with I </'</< r, there exists 
u = 1, . . . , s such that c ui c u j. 

Show that if A is a separating set for / over Z p , then the following algorithm 
completely factors / using ()( p\S\( 2 ) operations in F. 

H^in 

for each A e Ado 

for each a e h p do 
H' <r- 0 

for each hell do 

d <— gcd(A - a , h) 
if d = 1 or d = h 

then H' <- H' U {h} 
else H' <- H'J {d.h/d} 

H <- H' 

output H 

Exercise 20.15. Let / be as in the previous exercise. Show that if A is a 
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separating set for / over F, then the set 

W— 1 

S' := | mod / : 1 < j < w, A <e a} 

!=0 

is a separating set for / over Z p . Show how to compute this set using 
0(\S\t 2 len(p)w(w - 1)) operations in F. 


Exercise 20. 16. Let / be as in the previous two exercises, but further suppose 
that each irreducible factor of / is of the same degree, say k. Let E := F[X]/(f) 
and | := [X ] / G E. Define the polynomial 4> e E\ Y| as follows: 

k - 1 

4 >-= 

1=0 


If 


4> = Y k + a k -\ 7 fc_1 + • ■ ■ + a 0 . 


with ao, . . . , a*-! e L, show that the set 


:= {rep(a,j : 0 < i < k — 1} 

is a separating set for / over F, and can be computed deterministically using 
0(k 2 + k len(g)) operations in E, and hence 0(k 2 l 2 + k£ 2 len(g)) operations in F. 


Exercise 20.17. Put together all of the above pieces, together with Algo- 
rithms SLD and DDL, so as to obtain a deterministic algorithm for factoring poly- 
nomials over F that runs in time at most p times a polynomial in the size of the 
input, and make a careful estimate of the running time of your algorithm. 


Exercise 20. 18. It is a fact that when our prime p is odd, then for all integers 
a, b, with a ^ b (mod p ), there exists a non-negative integer i < p */ 2 log 2 p such 
that (a + i \ p) jL (b + i \ p) (here, “(• | •)” is th e Legendre symbol). Using this 
fact, design and analyze a deterministic algorithm for factoring polynomials over 
F that runs in time at most p 1 / 2 times a polynomial in the size of the input. 


The following two exercises show that the problem of factoring polynomials 
over F reduces in deterministic polynomial time to the problem of finding roots of 
polynomials over Z p . 

Exercise 20.19. Let / be as in Exercise 20.14. Suppose that S = {Ai, . . . , A s } 
is a separating set for / over Z p , and </;„ e F\X\ is the minimal polynomial over F 
of [A u ] / e F[X]/(f) for u = 1, . . . , s. Show that each <p » is the product of linear 
factors over Z p , and that given S, along with the roots of all the tpf s, we can deter- 
ministically factor / using (|A| + ) operations in F. Hint: see Exercise 16.9. 
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Exercise 20.20. Using the previous exercise, show that the problem of factoring 
a polynomial over F reduces in deterministic polynomial time to the problem of 
finding roots of polynomials over Z p . 


20.7 Notes 

The average-case analysis of Algorithm IPT, assuming its input is random, and 
the application to the analysis of Algorithm RIP, is essentially due to Ben-Or [14]. 
If one implements Algorithm RIP using fast polynomial arithmetic, one gets an 
expected cost of 0(/: 2+ " n 1 lcn(r/)) operations in F. Note that Ben-Or’s analysis 
is a bit incomplete — see Exercise 32 in Chapter 7 of Bach and Shallit [11] for a 
complete analysis of Ben-Or’s claims. 

The asymptotically fastest probabilistic algorithm for constructing an irreducible 
polynomial over F of given degree £ is due to Shoup [96]. That algorithm uses an 
expected number of O(f 2+0(1) + £ 1+ °1P len(< 7 )) operations in F, and in fact does not 
follow the “generate and test” paradigm of Algorithm RIP, but uses a completely 
different approach. 

As far as deterministic algorithms for constructing irreducible polynomials of 
given degree over F , the only known methods are efficient when the characteris- 
tic p of F is small (see Chistov [26], Semaev [88], and Shoup [94]), or under a 
generalization of the Riemann hypothesis (see Adleman and Lenstra [4]). Shoup 
[94] in fact shows that the problem of constructing an irreducible polynomial of 
given degree over F is deterministic, polynomial-time reducible to the problem of 
factoring polynomials over F . 

The algorithm in §20.2 for computing minimal polynomials over finite fields is 
due to Gordon [43]. 

The square-free decomposition of a polynomial over a field of characteristic 
zero can be computed using an algorithm of Yun [111] using 0 (£ l+o(l) ) field 
operations. Yun’s algorithm can be adapted to work over finite fields as well (see 
Exercise 14.30 in von zur Gathen and Gerhard [39]). 

The Cantor-Zassenhaus algorithm was initially developed by Cantor and 
Zassenhaus [24], although many of the basic ideas can be traced back quite a 
ways. A straightforward implementation of this algorithm using fast polynomial 
arithmetic uses an expected number of ()(t 1+a< [ } lcn (<:/)) operations in F . 

Berlekamp’s algorithm was initially developed by Berlekamp [15, 16], but again, 
the basic ideas go back a long way. A straightforward implementation using fast 
polynomial arithmetic uses an expected number of 0(1 3 + f 1+o(1) len(< 7 )) opera- 
tions in F: the term P may be replaced by where co is the exponent of matrix 
multiplication (see §14.6). 

There are no known efficient, deterministic algorithms for factoring polynomials 
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over F when the characteristic p of F is large (even under a generalization of the 
Riemann hypothesis, except in certain special cases). 

The asymptotically fastest algorithms for factoring polynomials over F arc due 
to von zur Gathen, Kaltofen, and Shoup:f the algorithm of von zur Gathen and 
Shoup [40] uses an expected number of Off 2+o(1 ) + ( 1+o(1) len(^r)) operations in 
F\ the algorithm of Kaltofen and Shoup [53] has a cost that is subquadratic in the 
degree — it uses an expected number of Off 1 - 815 lenfr/) 0407 ) operations in F when 
len(< 7 ) = Off 1 - 375 ). Exercises 20.1, 20.8, and 20.9 are based on [40]. Although 
the “fast” algorithms in [40] and [53] arc mainly of theoretical interest, a variant 
in [53], which uses Off 2 ' 5 + i 1+0(1) len(g)) operations in F , and space for Off 1 - 5 ) 
elements of F, has proven to be quite practical (Exercise 20.13 develops some of 
these ideas; see also Shoup [97]). 


t The running times of these algorithms can be improved using faster algorithms for modular composition — 
see footnote on p. 485. 
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Deterministic primality testing 


For many years, despite much research in the area, there was no known determinis- 
tic, polynomial-time algorithm for testing whether a given integer n > 1 is a prime. 
However, that is no longer the case — the breakthrough algorithm of Agrawal, 
Kayal, and Saxena, or Algorithm AKS for short, is just such an algorithm. Not only 
is the result itself remarkable, but the algorithm is striking both in its simplicity, 
and in the fact that the proof of its running time and correctness arc completely 
elementary (though ingenious). 

We should stress at the outset that although this result is an important theoretical 
result, as of yet, it has no real practical significance: probabilistic tests, such as the 
Miller-Rabin test discussed in Chapter 10, arc much more efficient, and a practi- 
cally minded person should not at all be bothered by the fact that such algorithms 
may in theory make a mistake with an incredibly small probability. 


21.1 The basic idea 

The algorithm is based on the following fact: 


Theorem 21.1. Let n > 1 be an integer. If n is prime, then for all a e Z„, we have 
the following identity in the ring Z„[X]: 

(X + o) n = X n + a. (21.1) 

Conversely, if n is composite, then for all a e Z*, the identity (21.1) does not hold. 


Proof. Note that 


(X + a )" = X n + a n + 


1=1 x 7 


If n is prime, then by Fermat’s little theorem (Theorem 2.14), we have a n = a, 
and by Exercise 1.14, all of the binomial coefficients (”), for i = I , arc 
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divisible by n, and hence their images in the ring Z„ vanish. That proves that the 
identity (21.1) holds when n is prime. 

Conversely, suppose that n is composite and that a e Z*. Consider any prime 
factor p of n, and suppose n = p k m, where p\ m. 

We claim that p k \ ('') . To prove the claim, one simply observes that 

( n\ n(n - 1) ■ ■ ■ (n — p + 1) 

p) = J\ ’ 

and the numerator of this fraction is an integer divisible by p k , but no higher power 
of p, and the denominator is divisible by p, but no higher power of p. That proves 
the claim. 

From the claim, and the fact that a e Z*, it follows that the coefficient of X"~ p 
in (X + a) n is not zero, and hence the identity (21.1) does not hold. □ 

Of course. Theorem 21.1 does not immediately give rise to an efficient primality 
test, since just evaluating the left-hand side of the identity (21.1) takes time £2(n) in 
the worst case. The key observation of Agrawal, Kayal, and Saxena is that if (21.1) 
holds modulo X r — 1 for a suitably chosen value of r, and for sufficiently many a , 
then n must be prime. To make this idea work, one must show that a suitable r 
exists that is bounded by a polynomial in len(n), and that the number of different 
values of a that must be tested is also bounded by a polynomial in len(n). 


21.2 The algorithm and its analysis 

The algorithm is shown in Fig. 21.1. A few remarks on implementation arc in 
order: 

• In step 1, we can use the algorithm for perfect-power testing discussed in 
Exercise 3.31. 

• The search for r in step 2 can just be done by brute-force search: likewise, 
the determination of the multiplicative order of [ n\ r e Z* can be done by 
brute force: after verifying that gcd (n,r) = 1, compute successive powers 
of n modulo r until we get 1 . 

We want to prove that Algorithm AKS runs in polynomial time and is correct. 
To prove that it runs in polynomial time, it clearly suffices to prove that there exists 
an integer r satisfying the condition in step 2 that is bounded by a polynomial in 
len(«), since all other computations can be carried out in time (r + len(n))°^\ 
Correctness means that it outputs true if and only if n is prime. 
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On input «, where n is an integer and n > 1, do the following: 

1. if n is of the form a b for integers a > 1 and b > 1 then 

return false 

2. find the smallest integer r > 1 such that either 

gcd(n, r) > 1 
or 

gcd(n, r) = 1 and 

[n] r e Z* has multiplicative order > 4 len(n) 2 

3. if/' = n then return true 

4. if gcd(n, r) > 1 then return false 

5. for j <r- 1 to 21en(n) |/ 1//2 J + 1 do 

if ( X + j) n ^ X n + j (mod X r - 1) in the ring Z„[X\ then 
return false 

6. return true 


Fig. 21.1. Algorithm AKS 


21.2.1 Running time analysis 

The question of the running time of Algorithm AKS is settled by the following 
fact: 

Theorem 21.2. For integers n > 1 and m > 1, the least prime r such that r \ n 
and the multiplicative order of \n\ r e Z* is greater than m is 0(m 2 len(n)). 

Proof. Call a prime r “good” if r \ n and the multiplicative order of \n\ r e Z* is 
greater than m, and otherwise call r “bad.” If r is bad, then either r \ n or r \ (if 1 — 1) 
for some d = 1, .... m. Thus, any bad prime r satisfies 

m 

r | n — 1). 

d= 1 

If all primes r up to some given bound x > 2 are bad, then the product of all primes 
up to x divides n Yl'd=i( nd ~ 1)' an d so in particular, 

m 

r < nY[{n d ~ 1 ), 

r<x d = 1 
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where the first product is over all primes r up to x. Taking logarithms, we obtain 

m m 

^ log r < log^njjfn' 7 - 1)) < flogn)(l + ^ flf) 

r<x d= 1 d= 1 

= (logn)fl + m(m + l)/2). 

But by Theorem 5.7, we have 

^ log r > cx 

r<x 

for some constant c > 0, from which it follows that 

x < c _1 (logn)(l + m(m + l)/2), 
and the theorem follows. □ 

From this theorem, it follows that the value of r found in step 2 — which need 
not be prime — will be Oflenfn) 5 ). From this, we obtain: 

Theorem 21.3. Algorithm AKS can be implemented so that its running time is 
Oflenfn) 16 - 5 ). 

Proof. As discussed above, the value of r determined in step 2 will be Oflenfn) 5 ). 
It is fairly straightforward to see that the running time of the algorithm is dominated 
by the running time of step 5. Here, we have to perform Ofr 1//2 lenfn)) exponentia- 
tions to the power n in the ring r L n [X\/{X r — 1). Each of these exponentiations takes 
Oflenfn)) operations in h n [X\/(X r — 1), each of which takes Ofr 2 ) operations in 
Z„, each of which takes time Oflenfn) 2 ). This yields a running time bounded by a 
constant times 

r 1/,2 len(n) x lenfn) x r 2 x lenfn) 2 = r 2 ' 5 lenfn) 4 . 

Substituting the bound Oflenfn) 5 ) for r, we obtain the desired bound. □ 


21.2.2 Correctness 

As for the correctness of Algorithm AKS, we first show: 

Theorem 21.4. If the input to Algorithm AKS is prime, then the output is true. 

Proof. Assume that the input n is prime. The test in step 1 will certainly fail. If the 
algorithm does not return true in step 3, then certainly the test in step 4 will fail as 
well. If the algorithm reaches step 5, then all of the tests in the loop in step 5 will 
fail — this follows from Theorem 21.1. □ 


The interesting case is the following: 
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Theorem 21.5. If the input to Algorithm AKS is composite, then the output is 
false. 

The proof of this theorem is rather long, and is the subject of the remainder of 
this section. 

Suppose the input n is composite. II' n is a prime power, then this will be detected 
in step 1, so we may assume that n is not a prime power. Assume that the algorithm 
has found a suitable value of r in step 2. Clearly, the test in 3 will fail. If the test 
in step 4 passes, we arc done, so we may assume that this test fails; that is, we may 
assume that all prime factors of n arc greater than r. Our goal now is to show that 
one of the tests in the loop in step 5 must pass. The proof will be by contradiction: 
we shall assume that none of the tests pass, and derive a contradiction. 

The assumption that none of the tests in step 5 fail means that in the ring Z n [X], 
the following congruences hold: 

(X + j) n = X n + j (mod X r — 1) (j = l,...,21en(n)[/ 1/2 J + 1). (21.2) 

For the rest of the proof, we fix a particular prime divisor p of n — the choice 
of p does not matter. Since p \ n, we have a natural ring homomorphism from 
Z„[X] to Z P [X] (see Examples 7.52 and 7.46), which implies that the congruences 
(21.2) hold in the ring of polynomials over Z p as well. From now on, we shall work 
exclusively with polynomials over Z p . 

Let us state in somewhat more abstract terms the precise assumptions we arc 
making in order to derive our contradiction: 

(AO) n > 1, r > 1, and I > 1 are integers, p is a prime dividing n, and 
gcd(n,r) = 1; 

(Al) n is not a prime power; 

(A2) p> r: 

(A3) the congruences 

(X + j) 1 ’ = X n + j (mod X r - 1) (j = 
hold in the ring Z P [X]; 

(A4) the multiplicative order of \n\ r e Z* is greater than 4 len(n) 2 ; 

(A5) l > 21en(n)|/ 1 / 2 J. 

The rest of the proof will rely only on these assumptions, and not on any other 
details of Algorithm AKS. From now on, only assumption (AO) will be implicitly 
in force. The other assumptions will be explicitly invoked as necessary. Our goal 
is to show that assumptions (Al), (A2), (A3), (A4), and (A5) cannot all be true 
simultaneously. 
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Define the Z p -algebra E := Z P [X]/{X'' - 1), and let | := e E, so that 

E = Z /; [c|. Every element of E can be expressed uniquely as g(^) = [g\x r - l, for 
g € Z P [X] of degree less than r, and for an arbitrary polynomial g e Z P [X], we 
have g(|) = 0 if and only if (X r - 1) | g. Note that c e E* and has multiplicative 
order r: indeed, f = 1, and — I cannot be zero for s < r, since X s — I has degree 
less than r. 

Assumption (A3) implies that we have a number of interesting identities in the 
Zp-algebra E: 

($ + j) n = ? + j U =h...J). 

For the polynomials gj := X + j e Z P [X\, with j in the given range, these identities 
say that gjiff = gj(E'). 

In order to exploit these identities, we study more generally functions ag, for 
various integer values k, that send g(c) e E to g(c k ), for arbitrary g e Z P [X], and 
we investigate the implications of the assumption that such functions behave like 
the k-power map on certain inputs. To this end, let Z (r> denote the set of all positive 
integers k such that gcd(r, k ) = \ . Note that the set Z (r) is multiplicative, by which 
we mean 1 e Z (r) , and kk r e Z (r) for all k , k ’ e Z ( r \ Also note that because of our 
assumption (AO), both n and p are in Z (r \ For k e Z (r \ let &k ■ Z p \X\ E be 
the polynomial evaluation map that sends g e Z P [X] to g(^ k ). This is of course a 
Zp-algebra homomorphism, and we have: 

Lemma 21.6. For all k e Z <r> . the kernel of Sk is ( X r — 1), and the image of &k 
is E. 

Proof. Let J := Ker which is an ideal of Z p \X\. Let Id be a positive integer 
such that kk’ = 1 (mod r ), which exists because gcd(r, k) = 1. 

To show that J = (X r — l),we first observe that 

W - 1 ) = (i k y - i = (f r ) k - 1 = i k - i = o, 

and hence ( X r — 1) C J. 

Next, we show that J C ( X r - 1). Let g e J. We want to show that (X r - 1) | g. 
Now, g e J means that g( ^ k ) = 0. If we set h := g(X k ), this implies that h(f) = 0, 
which means that ( X r — 1) | h. So let us write h = ( X r - 1) /, for some / e Z P [X], 
Then 

g(D = g(f k ’) = h(f) = - l )/(!*') = 0, 

which implies that ( X r — 1) | g. 

That finishes the proof that J = (. X r — 1). 

Finally, to show that d/ ( is surjective, suppose we are given an arbitrary element 
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of E , which we can express as g(|) for some g e Z P [X\. Now set h := g(X k ), and 
observe that 

»k(h) = h(S k ) = g(Z kk ') = g(|). □ 

Because of Lemma 21.6, then by Theorem 7.26, the map Gk ■ E —> E that sends 
g(|) e E to g(c k ), for g e Z P [A], is well defined, and is a ring automorphism — 
indeed, a Z p -algebra automorphism — on E. Note that for all k, k' e Z (, '\ we have 

• Ok = ov if and only if = g k ' if and only i f' A; = k' (mod /■). and 

• Ok ° Ok' = Ok' ° Ok = Okk'- 

So in fact, the set {ct* : 1 6 Z (r) } under composition forms an abelian group that 
is isomorphic to Z*. 

Remark. It is perhaps helpful (but not necessary for the proof) to examine 
the behavior of the map o k in a bit more detail. Let a e E, and let 

I — l 

a = Yj 

i=0 

be the canonical representation of a. Since gcd (r,k) = 1, the map 
n : {0, . . . , r— 1 } — ► {0, . . . , r— 1 } that sends i to ki mod r is a permutation 
whose inverse is the permutation n* that sends i to k'i mod r , where k' is 
a multiplicative inverse of k modulo r. Then we have 

r— 1 r— 1 r— 1 

o k (a) = ^ a^ kl = ^ ^ a n . (i) g. 

i = 0 i=0 i=0 

Thus, the action of a k is to permute the coordinate vector (a o, . . . , a r _i) 
of a , sending a to the element in E whose coordinate vector is 
{a K '( o), . . . , 1 ))- So we see that although we defined the maps a k in 

a rather “highbrow” algebraic fashion, their behavior in concrete terms is 
actually quite simple. 

Recall that the p-power map on £ is a Z p -algebra homomorphism (see Theo- 
rem 19.7), and so for all a e E, if a = g(<f) for g e Z P [X], then (by Theorem 16.7) 
we have 

a P = g^Y = ) = o p (a). 

Thus, o p acts just like the p-power map on all elements of E. 

We can restate assumption (A3) as follows: 

Onit + j ) = (I + jY (j = 

That is to say, the map a n acts just like the //-power map on the elements £ + j for 

j = h...,l. 

Now, although the o p map must act like the p-power map on all of £. there is 
no good reason why the a n map should act like the n-power map on any particular 
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element of E, and so the fact that it does so on all the elements c+j for./' = l, ... ,£ 
looks decidedly suspicious. To turn our suspicions into a contradiction, let us start 
by defining some notation. For a e E, let us define 

C(a) := {k e Z (r) : o k (a) = a k ], 

and for k e Z (r \ let us define 

D(k) := {aeE : a k {a) = a k }. 

In words: C(a) is the set of all k for which rr/< acts like the k -power map on a, 
and D(k) is the set of all a for which o k acts like the k -power map on a. From the 
discussion above, we have p e C(a ) for all a e E, and it is also clear that 1 e C(a) 
for all a e E. Also, it is clear that a e D(p ) for all a e E, and 1 e € D(k) for all 

k e Z< r >. 

The following two simple lemmas say that the sets C(a) and D(k) arc multi- 
plicative. 

Lemma 21.7. For every a e E, if k e C(a) and k' e C(a), then kk' 6 C(a). 
Proof. If o if a ) = a k and o k '{a) = a k ' , then 

^kk'(a) = G k (o k >(a)) = a k {a k ') = (o k (a)f = (a k ) k = a kk ' , 
where we have made use of the homomorphic property of o k . □ 

Lemma 21.8. For every k e if a e D(k ) and f e D{k), then af e D{k). 
Proof. If <y k {a) = a k and o k (fi) = jff then 

a k {af) = o k (a)o k (p) = a k f k = ( af) k , 
where again, we have made use of the homomorphic property of o k . □ 

Let us define 

• s to be the multiplicative order of \p\ r e Z* , and 

• t to be the order of the subgroup of Z* generated by \p\ r and \n\ r . 

Since r \ (p s — 1), if we take any extension field F of degree s over Z /; (which we 
know exists by Theorem 19.12), then since F* is cyclic (Theorem 7.29) and has 
order p s — 1, we know that there exists an element (eF of multiplicative order 
r (Theorem 6.32). Let us define the polynomial evaluation map f : Z /; [ X \ -> F 
that sends g e Z P [X\ to g(f) e F. Since X r — 1 is cleaidy in the kernel of r, then 
by Theorem 7.27, the map r : E -» F that sends g(f) to g( Q, for g e Z P [X], is a 
well-defined ring homomorphism, and actually, it is a Z p -algebra homomorphism. 

For concreteness, one could think of E as Z p [X\/(f), where / is an irreducible 
factor of X r — 1 of degree s. In this case, we could simply take f to be [A]/ (see 
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Example 19 . 1 ), and the map i above would be just the natural map from Z P [X] to 

Z P [X]/(f). 

The key to deriving our contradiction is to examine the set S := r( D(n)), that 
is, the image under r of the set D(n) of all elements a e E for which a n acts like 
the n-power map. 

Lemma 21.9. Under assumption (Al), we have 

\S\<n 2 ^ /2 K 

Proof. Consider the set of integers 

I:= {n u p v :u,v = 0 [t 1 ' 2 J}. 

We first claim that |7| > t. To prove this, we first show that each distinct pair 
(w, v) gives rise to a distinct value n“p v . To this end, we make use of our assumption 
(Al) that n is not a prime power, and so is divisible by some prime q other than p. 
Thus, if {u! , V) f ( u , v), then either 

• u f u' , in which case the power of q in the prime factorization of n l ‘p v is 
different from that in n u p v ', or 

• u = u' and v f V, in which case the power of p in the prime factorization 
of n u p v is different from that in n“ p v . 

The claim now follows from the fact that both u and v range over a set of size 
Lr 1//2 J + 1 > f l//2 , and so there arc strictly more than t such pairs (u, v). 

Next, recall that t was defined to be the order of the subgroup of Z* generated 
by [n\ r and [p] r ; equivalently, t is the number of distinct residue classes of the form 
[n u p v ] r , where u and v range over all non-negative integers. Since each element of 
I is of the form n l ‘p v , and |/| > 1, we may conclude that there must be two distinct 
elements of 7, call them k and /</, that arc congruent modulo r. Furthermore, any 
element of 7 is a product of two positive integers each of which is at most 
and so both k and k’ lie in the range 1, . . . , n 2 ^' 1 ^ . 

Now, let a e D(n). This is equivalent to saying n e C(a). We always have 
1 e C(a ) and p e C(a), and so by Lemma 21 . 7 , we have n u p v e C(a ) for all 
non-negative integers u, v, and so in particular, k, k' e C(a). 

Since both k and k' ai - e in C(a), we have 

ot(a) = a k and ov(a9 = oc k . 

Since k = k’ (mod r), we have a k = ov, and hence 

k k’ 

a k = a . 

Now apply the homomorphism t, obtaining 

T(oc) k = t (af. 
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Since this holds for all a e D(n ), we conclude that all elements of S are roots 
of the polynomial X k — X k . Since k k' , we see that X k — X k is a non-zero 
polynomial of degree at most ma \{k, k'} < n 2 ^'^ f and hence can have at most 
n 2 L fl/ “J roots in the field F (Theorem 7.14). □ 

Lemma 21.10. Under assumptions (A2) and (A3), we have 

| S’ | > 2 minW) - 1. 

Proof. Let m := min (t,(). Under assumption (A3), we have 3 + j e D(n) for 
j = 1, . . . , m. Under assumption (A2), we have p > r > t > m, and hence the 
integers j = 1, . . . , m are distinct modulo p. Define 

m m 

P := | JJ(Jf + j f J e %p[X] : ej e {0, 1 } for j = 1, . . . , m, and ^ ej < m j. 
i = i ]= i 

That is, we form P by taking products over all subsets S C [X + j : j = 1 ,...,m}. 
Clearly, \P\ = 2 m - 1. 

Define P(f) := {/(C) e E : / e Pj and P(Q := {/(C) e F : f e P}. Note 
that t(P( 0) = P(C), and that by Lemma 21.8, P(|) C D(n). 

Therefore, to prove the lemma, it suffices to show that |P(C)| = 2"' - 1. Suppose 
that this is not the case. This would give rise to distinct polynomials g,h e Z P [X], 
both of degree at most t — 1, such that 

g(C) e D(n), /2(C) e D(n), and r(g(0) = z(h( 0). 

So we have n e C(g( 0) and (as always) l,p € C(g( 0). Likewise, we have 
l,n,p e C(/j( 0). By Lemma 21.7, for all integers k of the form n u p v , where u and 
v range over all non-negative integers, we have 

k e C(g(0) and k e cm)). 

For each such k, since r(g(f)) = t(h( 0), we have r(g(^)) k = r(h( C)) fc , and hence 

o = T(g(C)) fe - T(/2(C)) fc 
= r(g(C) fc ) - rmf) (t is a homomorphism) 

= r(g(C fc )) - T(/ 2 (C*)) (/C e C(g(0) and k e C(h(0)) 

= g(C fc ) - h(C, k ) (definition of t). 

Thus, the polynomial / := g — h e Z P [X] is a non-zero polynomial of degree at 
most t — 1, having roots C k in the field F for all k of the form n"p v . Now, t is by 
definition the number of distinct residue classes of the form [n“p v ] r e Z*. Also, 
since C has multiplicative order r, for all integers k,k’, we have C k = Cf' if and 
only if k = k' (mod r). Therefore, as k ranges over all integers of the form n ll p v , 
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ranges over precisely t distinct values in F. But since all of these values arc 
roots of the polynomial /, which is non-zero and of degree at most t — 1, this is 
impossible (Theorem 7.14). □ 

We are now (finally!) in a position to complete the proof of Theorem 21.5. 
Under assumptions (Al), (A2), and (A3), Lemmas 21.9 and 21.10 imply that 

2 min(M)_ 1 < | 5 | <„2Ld/ 2 j_ (21.3) 

The contradiction is provided by the following: 

Lemma 21.11. Under assumptions (A4) and (A5), we have 

2min (f,£) _ y > 

Proof. Observe that log 2 n < len(n), and so it suffices to show that 

2min(7/) _ j ^ fl len(M) [A 2 J 

and for this, it suffices to show that 

min(f, t) > 21en(n)[t 1//2 J, 

since for all integers a , b with a > b > 1 , we have 2 a > 2 b + 1 . 

To show that t > 21en(n) , it suffices to show that t > 21en(«)t 1 / 2 , or 

equivalently, that t > 41en(n) 2 . But observe that by definition, t is the order 
of the subgroup of Z* generated by \n\ r and \p\ r , which is at least as large as 
the multiplicative order of \n\ r in Z*, and by assumption (A4), this is larger than 
4 len(n) 2 . 

Finally, directly by assumption (A5), we have £ > 2 len(n) [t */ 2 J . □ 

That concludes the proof of Theorem 21.5. 

Exercise 21.1. Show that if Conjecture 5.24 is true, then the value of r discov- 
ered in step 2 of Algorithm AKS satisfies r = 0(len(«) 2 ). 


21.3 Notes 

The algorithm presented here is due to Agrawal, Kayal, and Saxena [6], 

If fast algorithms for integer and polynomial arithmetic arc used, then using 
the analysis presented here, it is easy to see that the algorithm runs in time 
O(len(«) 10 - 5+o(1) ). More generally, it is easy to see that the algorithm runs in 
time 0(r 1 - 5+o(1) len(«) 3+o(1) ), where r is the value determined in step 2 of the 
algorithm. In our analysis of the algorithm, we were able to obtain the bound 
r = 0(len(n) 5 ), leading to the running-time bound O(len(n) 10 - 5+o(1) ). Using a 
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result of Fouvry [37], one can show that r = 0(len(n) 3 ), leading to a running- 
time bound of 0(len(n) 7 - 5+o(1) ). Moreover, if Conjecture 5.24 on the density of 
Sophie Germain primes were true, then one could show that r = 0(len(«) 2 ) (see 
Exercise 21.1), which would lead to a running-time bound of O(len(n) 6+0(1) ). This 
running-time bound can be achieved rigorously by a different algorithm, due to 
Lenstra and Pomerance [62]. 

Prior to this algorithm, the fastest deterministic, rigorously proved primality test 
was one introduced by Adleman, Pomerance, and Rumely [5], called the Jacobi 
sum test, which runs in time 

0(len(n) clen(len(lenW)) ) 

for some constant c. Note that for numbers n with less than 2 256 bits, the value of 
len(len(len(n))) is at most 8, and so this algorithm runs in time 0(len(n) 8c ) for any 
n that one could ever actually write down. 

We also mention the earlier work of Adleman and Huang [3], who gave a prob- 
abilistic algorithm whose output is always correct, and which runs in expected 
polynomial time (i.e., a Las Vegas algorithm, in the parlance of §9.7). 
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Al. Some handy inequalities. The following inequalities involving exponen- 
tials and logarithms are very handy. 

(i) For all real numbers x, we have 

1 + x < e x , 

or, taking logarithms, for x > -1, we have 

log(l + x) < x. 

(ii) For all real numbers x > 0, we have 

e~ x < 1 — x + x 2 /2, 

or, taking logarithms, 

-x < log(l — x + x 2 /2). 

(iii) For all real numbers x with 0 < x < 1/2, we have 

1 - x > e-*-* 2 > e~ 2x , 

or, taking logarithms, 

log(l - x) > -x - x 2 > — 2x. 

(i) and (ii) follow easily from Taylor’s formula with remainder, applied to 
the function e x , while (iii) may be proved by expanding log(l - x) as a 
Taylor series, and making a simple calculation. 

A2. Binomial coefficients. For integers n and Ic, with 0 < k < n, one defines 
the binomial coefficient 

/ n\ n\ n(n — 1) • • • (n — k + 1) 

\k/ k\{n — k)\ k\ 
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We have the identities 



and for 0 < k < n, we have Pascal’s identity 


n — 1 \ / n — 1 

k — 1 / + 


which may be verified by direct calculation. From these identities, it fol- 
lows that ( n k ) is an integer, and indeed, is equal to the number of subsets of 
{ 1 , . . . , n) of cardinality k. The usual binomial theorem also follows as an 
immediate consequence: for all numbers a, b, and for all positive integers 
n, we have the binomial expansion 

(a + bf = V ( n \a n - k b k . 
k = o 


It is also easily verified, directly from the definition, that 


< 


> 


k + 1 


k + 1 


n\ / n 

k) In — k 


for 0 < k < (n — l)/2. 


for (n — l)/2 < k < n, and 


for 0 < k < n. 


In other words, if we fix n, and view (") as a function of k, then this 
function is increasing on the interval [0, n/2], decreasing on the interval 
[n/2, n], and its graph is symmetric with respect to the line k = n/2. 

A3. Countably infinite sets. Let Z + := {1,2,3 ,...}, the set of positive inte- 
gers. A set S is called countably infinite if there is a bijection / : Z + -» A; 
in this case, we can enumerate the elements of S as xi, X2, X3, . . . , where 

xt := /(/)• 

A set S is called countable if it is either finite or countably infinite. 

For a set S , the following conditions arc equivalent: 


• S is countable; 

• there is a surjective function g : Z + -> A; 

• there is an injective function h : S ^ Z + . 


The following facts can be easily established: 
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(i) if S\,...,S n are countable sets, then so are U • • • U S„ and 
S{ x ■ ■ ■ x S n ; 

(ii) if Ai, S 2 , A 3 , . . . are countable sets, then so is (J/^i 

(iii) if A is a countable set, then so is the set U/^o ^ X1 of all finite 
sequences of elements in A. 

Some examples of countably infinite sets: Z, Q, the set of all finite bit 
strings. Some examples of uncountable sets: M, the set of all infinite bit 
strings. 

A4. Integrating piece-wise continuous functions. In discussing the Riemann 
integral J o f(t)dt, many introductory calculus texts only discuss in any 
detail the case where the integrand / is continuous on the closed inter- 
val [a, b], in which case the integral is always well defined. However, 
the Riemann integral is well defined for much broader classes of func- 
tions. For our puiposes in this text, it is convenient and sufficient to work 
with integrands that are piece-wise continuous on [a, b ], which means that 
there exist real numbers x$,x\, . . . ,Xk and functions fk, such that 

a = xq < x\ < ■ ■ ■ < Xk = b, and for each i = 1 the function f, is 

continuous on the closed interval [x,_i, x,], and agrees with / on the open 
interval (x;_i, xf). In this case, / is integrable on [a, b ], and indeed 

•b k „ Xj 

f{t)dt=Y J fi(t)dt. 
ia , = | 

It is not hai'd to prove this equality, using the basic definition of the Riemann 
integral; however, for our purposes, we can also just take the value of the 
expression on the right-hand side as the definition of the integral on the 
left-hand side. 

If / is piece-wise continuous on [ a , b], then it is also bounded on [n, b ], 
meaning that there exists a positive number M such that \f (i)\ < M for all 
t e [a, b\, from which it follows that |J* f(t) dt\ < M(b — a). 

We also say that / is piece-wise continuous on [a, oo) if for all b > a, f is 
piece-wise continuous on [a, b]. In this case, we may define the improper 
integral J“ f{t) dt as the limit, as b -> oo, of fit) dt , provided the limit 
exists. 

A5. Estimating sums by integrals. Using elementary calculus, it is easy to 
estimate a sum over a monotone sequence in terms of a definite integral, by 
interpreting the integral as the area under a curve. Let / be a real-valued 
function that is (at least piece-wise) continuous and monotone on the closed 
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interval [ a , b], where a and b arc integers. Then we have 


b 

min (f(a),f(b)) < ^ /(/') 

i=a 


rb 

f{t) dt < ma x(/(a), f{b)). 

J a 


A6. Infinite series. Consider an infinite series X”i x i- b i s a basic fact from 
calculus that if the x,’s are non-negative and X”i x ‘ converges to a value 
y, then any infinite series whose terms are a rearrangement of the xfs con- 
verges to the same value y. 

If we drop the requirement that the x,-’s are non-negative, but insist that 
the series X”il*il converges, then the series X/^i x i * s called absolutely 
convergent. In this case, then not only does the series x i converge to 
some value y, but any infinite series whose terms are a rearrangement of 
the x/’s also converges to the same value y. 

A7. Double infinite series. The topic of double infinite series may not be 

discussed in a typical introductory calculus course; we summarize here the 
basic facts that we need. 

Suppose that {xy}°° =1 is a family non-negative real numbers such that for 
each the series X; x,y converges to a value and for each j the series 
X,- Xjj converges to a value cj. Then we can form the double infinite series 
X, X 7 X U = X, r i an b the double infinite series X 7 X< X U = lhj c j- b 
... is an enumeration of all pairs of indices (i,j), we can 
also form the single infinite series X /( x i k jk- We then have X, X v X U = 
X 7 - X/ x ‘j = X/< x h,Jk ’ where the thi ce series either all converge to the same 
value, or all diverge. Thus, we can reverse the order of summation in a 
double infinite series of non-negative terms. If we drop the non-negativity 
requirement, the same result holds provided Xfcl x i k j k \ < 

Now suppose X, a i is an infinite series of non-negative terms that converges 
to A, and that X y is an infinite series of non-negative terms that converges 
to B. If (/i,y'i), (/ 2 , ji ), ... is an enumeration of all pairs of indices (i,j), 
then Xfc a p bj k converges to A B. Thus, we can multiply term-wise infinite 
series with non-negative terms. If we drop the non-negativity requirement, 
the same result holds provided X ; a i and X 7 bj converge absolutely. 

A8. Convex functions. Let I be an interval of the real line (either open, closed, 
or half open, and either bounded or unbounded), and let / be a real- valued 
function defined on I. The function / is called convex on I if for all 
xq, X 2 e I, and for all t e [0, 1], we have 


fitXQ + (1 - t)x 2 ) < t/(x 0 ) + (1 - t)f(x 2 ). 
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Geometrically, convexity means that for every three points P, = (x,-, /(x,)), 
i = 0, 1,2, where each x,- e I and xo < xj < xo, the point Pi lies on or 
below the line through Po and Pi. 

We state here the basic analytical facts concerning convex functions: 

(i) if / is convex on /, then / is continuous on the interior of I (but 
not necessarily at the endpoints of I, if any); 

(ii) if / is continuous on I and differentiable on the interior of I, then 
/ is convex on I if and only if its derivative is non-decreasing on 
the interior of I. 
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Entries arc listed in order of appearance. 


log: natural logarithm, xiv 

exp: exponential function, xiv 

0, €, C, C,u,n, \,|-|: set notation, xiv 

S\ x ■ x S n , S xn : Cartesian product, xiv 

{ x i } iei • family, xv 

sequence, xv 

Z: the integers, xv 

Q: the rationals, xv 

R: the reals, xv 

C: the complex numbers, xv 

oo : arithmetic with infinity, xvi 

[a, b ], ( a , b ), etc.: interval notation, xvi 

/ (5*): image of a set, xvi 

/ -1 : pre-image of a set/inverse function, xvi 

fog: function composition, xvi 

a | b: a divides b, 1 

[x \ : floor of x, 4 

\x ] : ceiling of x, 4 

a mod b : integer remainder, 4 

aL\ ideal generated by a , 5 

/ 1 + I 2 ' sum of ideals, 6 

gcd: greatest common divisor, 7 

v p {n)\ largest power to which p divides n, 10 

lcm: least common multiple, 1 1 

a = b (mod n): a congruent to b modulo n, 16 

b/ a mod n: integer remainder, 22 

a~ l mod n: integer modular inverse, 22 

[a] n , [a]: residue class of a modulo n, 25 

Z„: residue classes modulo n, 25 

Z* : invertible residue classes, 28 

(p(ri)\ Euler’s phi function, 31 

(Z*) m : mth powers in Z*, 36 

p{n)\ Mobius function, 46 

O, £2, 0, o, asymptotic notation, 50 

len(fl): length (in bits) of an integer, 62 

rep(a): canonical representative of a e Z„, 65 

k{x)\ number of primes up to x, 104 

6: Chebyshev’s theta function, 107 


li: logarithmic integral, 117 
C(s): Riemann’s zeta function, 118 
Map (I,G): group of functions / : I -+ G, 131 
mG: the subgroup { ma : a e G }, 133 
G{m}\ the subgroup {a e G : ma = Og}, 133 
G m : multiplicative subgroup {a m : a e G}, 133 
H 1 + H 2 '. sum of subgroups, 136 
H\H 2 '. product of subgroups, 136 
a = b (mod H): a- be H, 137 
[ a\n : coset of H containing a, 138 
G/H : quotient group, 140 
[G : H]: index, 140 
Kerp: kernel, 143 
Im p\ image, 143 
G = G': isomorphic groups, 146 
Hom(G, G'y. group homomorphisms G — » G ' , 151 
(a): subgroup generated by a, 153 
(ai , . . . , a/c): subgroup generated by a \, . . . , a k, 153 
a: complex conjugate of a, 167 
N(a ): norm of a 6 C, 167 
Map(7, R): ring of functions / : I R, 168 
AB: ring-theoretic product, 169 
a | b: a divides b, 170 
R *: multiplicative group of units of R, 170 
Z [/]: Gaussian integers, 174 
Q (m) : {a/b : gcd (b,m) = 1}, 174 
R[X] : ring of polynomials, 176 
deg(g): degree of a polynomial, 177 
lc(g): leading coefficient of a polynomial, 177 
g mod h: polynomial remainder, 178 
aR: ideal generated by a, 186 
(«!,..., a/c ): ideal generated by a \, . . . , a^, 186 
R/I: quotient ring, 187 
a = b (mod d ): a — b € dR, 187 
[a\d\ the residue class [a\dR, 187 
R[a]: smallest subring containing R and a, 192 
R[a\ , . . . , a n \. smallest subring containing R and 
193 
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R = R': isomorphic rings, 195 

P: probability distribution, 207 

Pi P2, P”: product distribution, 211 

P[A\B\. conditional probability of A given B, 214 

E[X]: expected value of X, 233 

Var[X]: variance of X, 235 

E[X | B\. conditional expectation of X given B , 237 
A[X; /]: statistical distance, 260 
y ■£- { 0, 1 } , y 4 - { 0, 1 } : assign random bit(s), 278 
y T: assign random element of T, 287 
logj, a: discrete logarithm, 327 
( a | p ): Legendre symbol, 342 
(a | n ): Jacobi symbol, 346 
J n \ Jacobi map, 347 

Map (/, M): R-module of functions / : I -» M, 360 
cM: submodule {ca : a € M}, 361 
M{c}: submodule {a € M : ca = 0m}, 361 
submodule {ca : c € i?}, 361 
(ai , . . . ,ak)R: submodule generated by <*1, . . . , a*, 
361 

polynomials of degree less than t, 361 
M/N : quotient module, 362 
M = M isomorphic modules, 365 
Homfl(M, M'): i?-linear maps M -> M', 366 
dimf (F): dimension, 372 
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